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SUMMARY 


This report details an investigation of video data compression applied to 
microgravity space experiments using High Resolution High Frame Rate Video 
Technology (HHVT). An extensive survey of methods of video data compression, 
described in the open literature, was conducted. The survey examines compression 
methods employing digital computing. The results of the survey are presented. They 
include a description of each method and an assessment of image degradation and 
video data parameters. An assessment is made of present and near term future 
technology for implementation of video data compression in a high speed imaging 
system. Results of the assessment are discussed and summarized in a tabular listing 
of implementation status. 

The results of a study of a baseline HHVT video system, and approaches for 
implementation of video data compression, are presented. Case studies of three 
microgravity experiments are presented and specific compression techniques and 
implementations are recommended. 

The results of the investigation conclude that video data compression approaches 
for microgravity space experiments are experiment peculiar in requirements and no 
one single approach is universally optimum. It is shown, for the experiments studied, 
that data compression required is separable into two approaches: the first to limit 
data rates for storage, and the second to reduce data rates for transmission. For high 
resolution and/or high frame rate experiment requirements and real time compression, 
hardware implementations are currently limited, by technology, to methods that can 
be implemented using parallel processing and simple compression algorithms. 
Although theoretically attractive, no approach could be identified for focal plane 
processing alone, that could be implemented with state of the art hardware. 
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ACRONYMS 


A 

AC 

A/D 

ASIC 

A-VQ 

B 

BTC 

C 

CAQ 

CCIR 

CCD 

CID 

CMOS 

CB 

D 

D/A 

DC 

DCT 

DCT/DPCM 

DCT/MC 

DFT 

DM 

DPCM 

DSP 

G 

GaAs 

GSP 

H 

HDTV 

HHVT 

HVS 


Alternating Current, the remaining coefficients in a image 
transform 

Analog to Digital conversion 
Application Specific Integrated Circuit 
Address-Vector Quantization 


Block Truncation Coding 


Constant Area Quantization 

Consultative Committee, International Radio 

Charge Coupled Device 

Charge Injection Device 

Complementary Metal Oxide Semiconductor 

Conditional Replenishment or Compression Ratio 


Digital to Analog conversion 

Direct Current, refers to the average pixel value in a transform 
Discrete Cosine Transform 

Discrete Cosine Transform/Differential Pulse Code Modulation 
Discrete Cosine Transform/Motion Compensation 
Discrete Fourier Transform 
Delta Modulation 

Differential Pulse Code Modulation 
Digital Signal Processing 


Gallium Arsenide, a high speed semiconductor device 
Graphics System Processor 


High Definition Television 

High Resolution, High Frame Rate Video Technology 
Human Visual System 
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I 


IDS 

Intensity Dependent Spread 

K 

KLT 

Karhunen-Loeve Transform 

L 

LPC 

Linear Predictive Coding 

LPF 

Low Pass Filter 

LZW 

Lempel-Ziv-Welch algorithm 

M 

MAPS 

Micro-adaptive Picture Sequencing 

MC 

Motion Compensation 

MC/2D-DCT 

Motion Compensation/Two-Dimensional Transform 

MNVC 

Minimum Noise Visibility Coding 

MSE 

Mean Square Error 

N 

NMSE 

Normalized Mean Square Error 

NTSC 

National Television System Committee 

O 

o ^ 2 

Oxygen/Nitrogen gas atmosphere 

p 

PAL 

Phase Alternation Line, color television system designed by 
Telefunken 

PCAQ 

Predictive Constant Area Quantization 

PCM 

Pulse Code Modulation 

PE 

Processing Element 

Pixel 

Picture Element 

PROM 

Pockel’s Readout Optical Modulator 

PRN 

Pseudo-Random Noise 

PSC 

Perceptual Space Coding 

R 

RAM 

Random Access Memory 

RGB 

Red Green Blue, a common digital color coordinate system 
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RLC 
RS 170 


8 

SNR 

SQ 

ss 

SSF 

T 

TDRSS 

2- D 

3- D 

V 

VDC 

VLSI 

VQ 


WHT 

X 

XFORM 

Y 

YIQ 


Run-length Coding 

Electronic Industries Association performance standards for 
monochrome display systems \ RS 170A applies to color displays. 


Signal-to-Noise Ratio 
Scalar Quantization 
Space Station 
Space Station Freedom 


Tracking and Data Relay Satellite System 

Two-dimensions 

Three-dimensions 


Video Data Compression 

Very Large Scale Integrated circuits 

Vector Quantization 


Walsh-Hadamard Transform 


Transform 


A color coordinate system employed in broadcast television: 
The Y-component is luminance, The I- and Q-components are 
chrominance. 
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UNITS 

This list contains abbreviations for units of measure used throughout this report. 


B 

b 

bpp 

Byte/p 

cm 

dB 

fr/s 

fsc 

GB 

Gp 

hr 

lines/mm 

Mb 

Mbps 

MHz 

min 

Mp/s 

ns 

p/fr 

sec 


byte 

bit 

bits per pixel 
Bytes per pixel 

centimeter 

decibel 

frames per second 
frequency of subcarrier 

Gigabytes (10 9 bytes) 

Gigapixels (10 s pixels) 

hour 

lines per millimeter, a measure of optical resolution 

Megabits (10 6 bits) 

Megabits per second (10 8 bits/sec) 

Megahertz (10 8 cycle per second) 
minute 

Megapixels per second (10 8 pixels/sec) 
nanosecond (10 9 seconds) 
pixels per frame 
second 
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COMPRESSION TECHNIQUES 


Introduction 

The performance of each of the video data compression techniques presented here 
is very much dependent upon the statistics of the image data. Some of the relevant 
statistics include the entropy of the data within a frame, the correlation of picture 
elements (pixels) within and/or between frames, and the amount of detail and/or motion 
contained in a frame. 

Performance Results 

In this report we will include some of the experimental results from computer 
simulations of various techniques being applied to actual images. These results, 
including bit rates, compression ratios, error measurements, and image quality 
judgments, are simply the data collected from a small number of experiments 
performed with specific images. Many of the images are from standard, RS 170 
television signals. For the images in microgravity experiments, these results may 
differ greatly. Also, the results of these techniques are expected to change with 
increased image resolution. The Baseline HHVT system employs a camera with 
resolution greater than RS 170 images. Even if the results are found to be similar, 
the images to be produced from the various microgravity experiments are extremely 
Hi ggimilar in content to both broadcast television and aerial reconnaissance images 
which are commonly used for simulations. Therefore, the performance data of the 
image compression techniques we will present may not be indicative of their 
performance in the context of an HHVT system. 

Evaluation Criteria 

The reproduced images from the simulations of codecs are usually evaluated 
either on an objective (i.e. minimum error) basis or for subjective image appearance. 
Some of the objective error measures are 

X[ir(*,y)-i 0 (x,y )] 2 
MSE = — r 


NMSE = 


MSE 

tit* 


SNR • 

where i r is the reconstructed intensity value, i a is the original intensity value, and 
is the maximum possible intensity value, and N is the number of picture elements 
(pixels) in the image. 

For microgravity experiment applications, the interest is usually in obtaining 
measurements of specific physical quantities from the reconstructed data. Therefore, 
the evaluation criteria may be very different. 
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Spectral Information 

Nassau [1] gives three uses for the word "color". For the purposes of this report, 
we mean "a class of sensations" produced by the human visual system (HVS). An 
image can appear to the HVS to have colors that are identical to the real scene being 
imaged even though the actual spectral composition of the reproduced image may be 
very different from that of the actual light incident on the sensor(s). Apparently, this 
is possible because the human eye has only three types of color sensors (cones) each 
of which responds to red, green, or blue light, with some overlap in the spectral 
sensitivities. Therefore, the color information available to higher levels of the HVS 
consists solely of three scalar values. This is not enough information to uniquely 
determine the spectral distribution of the incident light. A specific combination of 
sensor outputs produces a specific color sensation regardless of the spectral 
distribution of the light that caused the sensor responses. 

This phenomenon of the HVS is the basis of the standard RGB system of producing 
color television pictures. Most of the colors that can be perceived by the HVS can be 
produced by a combination of the proper proportions of red, green, and blue 
monochromatic light. The reproduced colors will match the originals well if the spectral 
responses of the camera sensors closely approximate the spectral responses of the 
cones in the human eye. The match does not have to be exact since the HVS has a 
perception threshold for the discrimination of color. Two RGB signals that differ by 
amounts un der the threshold will produce the same color. The threshold is not identical 
for the three components of the signal. It is also dependent on the color. 

An experimenter may be more concerned with the spectral distribution of the 
light source or the surface spectral reflection of the objects being imaged than in the 
color perceived by a human observer. In general , it is not possible to uniquely determine 
this information using a finite number of sensors and/or filters with different spectral 
responses. If the interest is only in a finite number of discrete frequencies, the 
intensities can be uniquely determined from an equal number of sensors. 

Wandell [2] describes a method for extracting spectral information from a 
multi-spectral video signal consisting of the outputs of a finite number of sensors. This 
method approximates the spectral curve (intensity vs. frequency or intensity vs. 
wavelength) by using basis functions. The number of basis functions that can be used 
is less than, or possibly equal to, the number of sensors. This method will provide a 
reasonable approximation to the curve only if the variation of intensity with 
wavelength is slow (low-pass) or if some information about the shape of the curve is 
known so that appropriate basis functions can be used. 

Because the relationships among the color signals received from the sensors, the 
spectral information they represent, and the perceptions of the HVS are so complex; 
an analysis of the effects of data compression techniques on spectral information is 
very difficult. Any analysis would have to consider the responses of each of the sensors 
and the type of spectral information to be determined in addition to the type and 
magnitude of the errors introduced by a specific data compression technique. 
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When the information to be derived from the color signal consists solely of HVS 
perception, significant compression can be achieved by separating the signed into 
luminance and chrominance components and sampling the chrominance signals at a 
lower spatial frequency. This has little effect on the perceived colors. Most likely, this 
is because the HVS performs some similar operation before interpreting the color 
information. If other types of spectral information are needed, this type of compression 
will not be acceptable. 

The results reported in the literature on co mpre ssion of color Images usually 
involve images for television. In typical color television pictures, the illumination is 
broad-band across the visual spectrum. Therefore, there is some statistical redundancy 
between the three color components. For some of the planned microgravity 
experiments, particularly the self-illuminating ones, this may be far from true. 


1. Nassau, K., The Physics and Chemistry of Color, Wiley-Intersdence Publication, pp. 3 and 13, 

1983. 

2. Wandell, B., "The Synthesis and Analysis of Color Images," IEEE Transactions on Pattern 

Analysis and Machine Intelligence, vol. PAMI-9, pp. 2-13, Jan. 1987. 
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Overview of Compression Technique Performance 

For all of the reasons mentioned above, the results of various compression 
techniques reported in the literature may have little applicability to HHVT . If a reliable 
and accurate analysis of the effects of errors on the information contained in the image 
is to be performed, it will have to be done for specific types of images from experiments 
once thp specifications of the sensors and the content of the images are known. A 
general evaluation of the effects of a data compression technique on HHVT images 
would have limi ted value. The results reported in the literature are presented here 


only to provide a basis for further analysis. 


PERFORMANCE OF COMPRESSION TECHNIQUES 


COMPRESSION 

TECHNIQUES 


REVERSIBLE METHODS 


1. Run-Length Coding 


2. Contour Coding 


3. Hufflnan Coding 


4. Arithmetic Coding 


5. Conditional Replenishment 


6. Bit-plane Encoding 


PREDICTIVE METHODS 


1. linear Predictive Coding 


2. Differential Pulee Code Modulation 
CDFCM) 


3. Delta Modulation 


4. Motion Compensation (MO 


COMPRESSION 
bits per pixel 



depends on number of contours 



depends on motion or image 
content 


ERRORS 

%MSE 


Not 

applicable 


Not applicable 


Not applicable 


Not applicable 


Not applicable 




highly susceptible to 
transmission errors 
highly susceptible to 
transmission errors 


highly dependent on 
quantization' susceptible 
to transmission errors 


COMMENTS 


This compression is possible with low detail 
images, encoded at 8 bpp. 

This compression is possible with typical 
television Images. 

This compression is possible with high detail 

images. 


This technique is most effective when used with 
two- tone, line drawings. 


This technique is most often used in addition to 
lossy, entropy reducing methods. 


Technique’s performance is similar to Huffman 
Coding 


Reversible compression ratios depend on 

amount ot motion or background change in 
Image. 

This compression yields lossy compression with 
good quality reconstructed images 


This method offers additional improvement! 
over previous methods, especially when 
using gray codes. 


Prediction is function of image’s statistics. 



Analog input signal simplifies implementation, 
but must be sampled at rate higher than 
Nyquist rate. Marginally acceptable 
quality. 


Avenge compression for good quality pictures. 


t Data not available in literature reviewed 


This table continues on the following page. 
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PERFORMANCE OF COMPRESSION TECHNIQUES 


COMPRESSION 

TECHNIQUES 

COMPRESSION 

bits per pixel 

ERRORS 

%MSE 

COMMENTS 

BLOCK METHODS 




1. Vector Quantization (VQ) 

0.6 -0.8 

0.1 

This compression is possible using monochrome 


1.6 -2.0 

0.1 

This compression is possible using color images. 


0.1 - 0.2 

t 

This compression is possible with a motion 
compensation technique. This bpp holds 
true if motion is £20% in the image. 

2. Vector DPCM 

0.6 

t 

This compression is achieved using monochrome 

images. 

This compression is achieved using color 
images. 

In general. Vector DPCM produces better 
results than VQ at same bit rate. 


0.75-1.08 

n 

3. Block Truncation Coding 

1.626 

t 

This compression is achieved using monochrome 

images. 

This compression is achieved using color 


2.13 

t 


0.9 

t 

This compression is achieved using interframe 
coding. 

4. Variable Resolution Coding 
MAPS 

0.693 

0.82 


Tree coding 

0.2 - 9.0 

t 

Depending on the amount of detail, this method 
can be reversible. 

HVS COMPENSATION 

1. Synthetic High* 

a 

t 

Amount of compression depends on threshold 
values and desired image quality. 

2. Pyramid Coding 

0.7 - 1.0 

<1.0 

Quantisation errors occur at high frequencies. 

3. Region Growing 

b 

t 

This technique does not yield good results for 
small objects or details in original image. 

4. Directional Decomposition 

c 

d 

t 

t 

Hus compression was achieved using 8 bit 
original. 

At this compression, this method produces 
blurred but still recognizable images. 

6. Anisotropic Non stationary 
Predictive Coding 

e 

t 

This technique does not handle fine texture at 
high compression rates. 


t Data not available in literature reviewed 
i Square error < 18 
$$ Square error < 200 

Compression ratios depend on number of bits used to encode original and reconstructed images: 
•4:1 to 23:1 
b 30:l 
'60:1 

d 90:1 to 200:1 
‘20:1 to 30:1 

This table continues on the following page. 
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PERFORMANCE OF COMPRESSION TECHNIQUES 

COMPRESSION I COMPRESSION I ERRORS I COMMENTS 


TECHNIQUES 1 bits per pixel | 


HV8 COMPENSATION (CONTINUED) 


6. Minimum Noise Visibility Coding 

4.6 

6.6 

7. Constant Area Quantisation (CAQ) 

1.08 


1.2 


1.0 -1.3 

8. Perceptual Space Coding 

0.1 


0.26 


%MSE 


This compression was achieved using 
monochrome image*. 

Thi« comprecsion waa achieved using color 
image*. 



0.72 I A low detail, monochrome image waa encoded 

1 resulting in usable quality reproduction*. 

0.36 - 3.30 I A number of color image* of varying detail were 

1 encoded resulting in excellent quality 


1 

TRANSFORM CODING 

1. Karhunen-Loeve (KLT) 

© 
*— * 

o 

2. Discrete Cosine (DCT) 

0.6 - 1.0 

3. Slant 

1.0- 1.6 


I I 

I 1.6 -0.6 ' ™j 



reproduction! 



Using adaptive technique i, DCT perform* 
dosefr to KLT 




4. Hadamard 

1.0 - 1.5 

1.5 - 1.0 

Classic, straightforward hardware 


implementation exists for this transform; 




however with VLSI, it is being replaced by 
better performing DCT. 

6. Haar 

0.7 - 1.7 

0.8 - 0.2 

With adaptive quantisation, this transform has 
better performance than Hadamard; but it 
is also Deing replaced by DCT. 


HYBRID METHODS 


l.DCT/VQ 

0.7 - 0.8 

** 

This level of compression did not result in high 
quality reconstructed images. 

At this compression, reconstructed images 


1.1 

•* 



contained no visible distortion. 

2.DCT/MC 

0.1 -0.4 

t 

This performance uses adaptive DCT. 


t Data not available in literature reviewed 

* Quantization errors at high frequencies 

** 1-2 db higher signal-to-noise ratio than DCT alone 





























Reversible Image Compression 

Image compression schemes are said to be reversible, or information- lossless, if 
the original digital representation of the image can be fully reconstructed at the 
receiver from the compressed data. Compression can be achieved without any loss of 
information only if the digital representation contains redundancies. This is usually 
the case for digital video images. The measure of the amount of information contained 
in a set of data indicates the entropy of the information source producing the data. 

In order to define the entropy, we must first define the source. A source has an 
alphabet of symbols that it can produce, as well as a set of probabilities for the 
production of each symbol in the alphabet. If the probability of occurrence of each 
symbol is independent of all other symbols, the entropy (bits/symbol) is defined as 


//(5) = -I[P,log 2 (P i )] 

«= i 

where P, is the probability of symbol i occurring. This is known as the zeroth-order 
entropy. 

It is also possible to have a source where the probabilities of the production of a 
given symbol depend on m previously produced symbols. This is known as an /nth-order 
Markov source. The entropy for this source (sometimes known as a conditional entropy) 
is 

H(S) = -S[P(\»— ,s it ,s i )log t P(? t I 

where the sum is taken over all n members of the symbol alphabet for each of the m 
symbols obtained from the source, i. e., nm terms. This entropy will be not greater 
than the zeroth-order entropy, with equality only if the symbols are independent. 

For a digital video image, the symbols are the intensity values at the pixels. The 
alphabet depends on the quantization. For eight- bit PCM quantization there are 256 
symbols. The entropy is generally stated in bits per pixel (bpp). In general, the values 
at nearby pixels are highly correlated. Therefore, a lower entropy is obtained, since 
the imag e can be represented as the output of a low-order Markov source. 

Some reversible image compression schemes are 

1) Run-length Coding 

2) Contour Coding 

3) Huffman Coding 

4) Arithmetic Coding 

5) Conditioned Replenishment 

All of these reversible techniques will, in general, produce output at a variable 
rate. In order to transmit at a constant rate, a (large) buffer will be required in the 
transmitter. With today’s memory technology, this does not represent a problem. 
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Run-length Coding 

Very often, large regions of an image are relatively uniform in intensity and/or 
color. This can be used to our advantage by replacing long strings of identical pixels 
with short strings containing the intensity (once) and the number of repetitions. One 
problem with the practical implementation of this scheme is how to allow the receiver 
to distinguish between intensity codes and repetition codes. There are two different 
ways of solving this problem. One method is to divide all of the pixels into runs (of 
length 1 or more) and to include a length code between every intensity code. A second 
method involves reserving one value for an escape code to signal the start of a run. 
The advantage of the latter method is that the worst case scenario will result in no 
compression, whereas for the first method the number of bits per pixel could be 
increased. The cost of the second method is the number of intensity values possible 
for a given number of bits is reduced by one. If the second method is applied to an 
image which was quantized using the full scale of values, the compression will not be 
reversible [3][4]. 

Compression 

Usually, the number of bits used to encode the length of a single run is constant. 
Therefore, there is a maximum run length, m, that can be encoded, with longer runs 
being divided into multiple runs. The length of each run depends on the correlation 
between each pixel and the m previous pixels. The achievable compression is therefore 
limited by the entropy of the image when it is modelled as the output of an mth-order 
Markov source. However, there is no guarantee that run-length coding (RLC) will 
approach this limit. Low detail images may be encoded at about 1.5 - 2.0 bpp for an 
8-bit original. Typical television images will require about 3.5 bpp. High detail images 
could require up to 16 bpp if the first method discussed above is used. 

Contour Coding 

If the image to be compressed is a two-level (binary) image, the entire image can 
be reconstructed from knowledge of the contours that define the boundaries between 
the regions. Therefore, binary images can be encoded at low bit rates by transmitting 
only the pixels that are part of the contours [5]. More savings can be achieved by 
dividing the contours into line segments and assigning each segment a code [6]. These 
techniques can also be applied to multilevel images by including either an intensity 
value for each area or gradient values along the contours. 

Compression 

The compression depends on the number of contours in the image. This technique 
is most effective for line drawings which are mostly background. It is also used when 
coding coefficients in transform coding. 
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Huffman Coding 

In an entropy encoding scheme the amount of compression achievable is limited 
by the entropy of the data. Huffman coding [7] is such an entropy encoding scheme. 
It takes advantage of the non-uniform distribution of the occurrences of pixel 
intensities, regardless of position in the image. (If the distribution is uniform across 
all possible values, the bit rate is equal to the entropy already, and no compression 
will be achieved.) The technique involves assigning a code to each intensity value with 
the shorter codes going to the more probable events. The compression limit given by 
the entropy of the image is almost achieved. If the probability distribution is known 
in advance, the receiver can store the codes. However, in many cases the codes must 
be generated based on the actual statistics of the image. In this case, the codes must 
be transmitted as overhead information. For large images, the overhead will not be 
significant. 

Compression 

The bit rate is limited by the zeroth-order entropy of the image. Huffman coding 
wiD achieve bit rates very close to this limit for large numbers of pixels. The 
zeroth-order entropy for 8-bit video images is often about 7.5 bpp, so Huffman coding 
is most often used in combination with a lossy, entropy-reducing compression 
technique. 


Arithmetic Coding 

Another entropy encoding scheme is Arithmetic coding [8][9][10]. In this 
technique the interval from zero to one (including zero, excluding one) is divided 
according to the probabilities of the occurrences of the intensities. Each subinterval 
is assigned a code representing a fractional value that it contains. The division process 
can be repeated many times by insuring that the boundaries are rational values and 
storing the numerator and denominator as integers. Therefore, long codes can be 
produced for long strings of pixels. As with Huffman Coding, shorter codes are assigned 
to more probable strings. The possibility of stringing together pixels when encoding 
allows the coding rate to approach the entropy of the image even more closely than 
Huffman codes which must assign an integral number of bits to the code of each pixel. 
The codebook is not transmitted, but the decoding is done through knowledge of the 
probabilities of the intensity levels. 

Compression 

Similar to Huffman coding. (See above) 
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Conditional Replenishment 

When there is very little motion in a sequence of video frames, there will be high 
correlation between individual pixels of adjacent frames. Much compression can be 
achieved by only transmitting the locations and intensity values of those pixels that 
changed since the previous frame. The receiver saves the previous frame and 
replenishes the changed pixels with their new values. The amount of compression 
that can be achieved depends on the correlation between frames. If the replenished 
pixels appear in runs, the overhead information could be reduced by transmitting a 
starting address and a run length instead of a location for each pixel. Much greater 
compression can be achieved by using conditional replenishment as a non-reversible 
technique. This is accomplished by not transmitting any pixel whose value is "close" 
to that of the previous frame [11][12][13]. 

Compression 

Lossy conditional replenishment (CR) can obtain bit rates as low as 1.0 bpp with 
good q uali ty. Reversible compression ratios depend on the amount of motion or 
background change in the specific scene. 

Bit-plane Encoding 

Some of the lossless compression techniques we have discussed, specifically 
run-length coding, and contour coding can be used more effectively if they are applied 
to each bit-plane separately. This can be done by using the binary representation of 
the pixel intensity values of the image. The most significant bit at each pixel is treated 
as a binary image. The compression scheme is applied to this image. Each of the 
remaining bits is treated likewise. At the receiver each bit-plane is decoded, and the 
bit-planes are combined to reconstruct the image. This scheme will often provide the 
possibility of more compression since the more significant biLplanes will have large 
uniform areas which can be greatly compressed [14], 

Ifbit-plane coding is to be used, the amount of achievable compression can usually 
be increased by replacing the standard binary representation of the intensity values 
with a gray code. The gray code reorders the binary symbols such that consecutive 
symbols diff er by exactly one bit. Since real images often have consecutive intensity 
values at adjacent pixels, using the gray code can increase the size of uniform regions. 


3. Pracht, B.R. and Bowyer, K.W., "Adaptations of Run-Length Encoding for Image Data," IEEE, 

pp. 6-10, 1987. 

4. Lansing, D.L., "Experiments in Encoding Multilevel Images as Quadtrees," NASA Technical 

Paper, no. 2722, 1987. 

6. Schreiber, W.F., Huang, T.S., et al., "Contour Coding of Images,” pp. 443-8, 1972. 

6. Chaudhuri, B.B. and Kundu, M.K, "Digital Line Segment Coding: A New Efficient Contour 
Coding Scheme," IEEE Proceedings, vol. 131 E, pp. 143-147, July 1984. 

7 Huffman, D.A., "A Method for the Construction of Minimum-Redundancy Codes," Proceedings of 
the IRE, pp. 1098-1101, September 1952. 

8. Jones, C.B., "An Efficient Coding System for Long Source Sequences," IEEE Transactions on 
Instrumentation Technology, vol. IT-27, pp. 280-291, May 1981. 
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Predictive Methods 

Differential Pulse Code Modulation 

Predictive methods involve predicting the intensity value at a given pixel based 
on the values of previously processed pixels. Usually, the difference between the 
predicted value and the actual value is transmitted. This technique is generally known 
as Differential Pulse Code Modulation (DPCM). The receiver makes the same 
prediction as the transmitter and then adds the difference signal to it in order to 
reconstruct the original value. A predictive method usually involves three 
steps: prediction, quantization, and coding. These functions are shown in Figure 1. 



Figure 1: Differential Pulse Code Modulation encoder 

The difference, d, is obtained by subtracting i, the prediction, from the input x. The 
difference signal, d, is quantized as shown and the signal d q is transmitted. The 

quantized input to the predictor, x qf is obtained by combining £ with d q , as shown. 

Various reversible coding techniques, discussed in Reversible Image 
Compression, can be applied to the quantized difference signal as presented. 
Therefore, only prediction and quantization will be covered in this section. 

Predictive methods can be applied in one or two dimensions within a frame and/or 
on a f ram e- to-frame basis. The methods can either be fixed or adaptive. Adaptivity 
can be included in the prediction and/or in the quantization. 
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Prediction : 

In the prediction step we try to predict the intensity value at the current pixel 
as closely as possible. This is done to take advantage of the correlation existing between 
pixels in a region. If the correlation is high, the predictions will usually be fairly close 
to the actual values. Therefore, most of the differences will be of low absolute value, 
and the zeroth-order entropy of the difference signal will be lower than that of the 
original signal. 

The simplest prediction methods use the value at the previous pixel as the 
predicted value. Somewhat better results can be obtained by using a linear 
combination of n previous pixels. Better prediction can also be achieved by using 
two-dimensional prediction. This is usually done by using a linear combination of the 
previous pixel and adjacent pixels from the previous row. The optimal coefficients of 
the linear combinations can be deriv ed from the st ati stics of the image. They can be 
computed separately for each small region of the image. In this case the coefficients 
must be sent to the receiver along with the difference signal [15][16]. 

Another possibility, used in 2-1), is to choose a small number of combinations of 
coefficients [17] [18]. For each pixel the prediction is made using each of these 
combinations. The best prediction is chosen, and a code is sent to the receiver to 
indicate which combination was used. Making the predictions in this way is especially 
effective when the image contains sharp edges with arbitrary orientations. 

Linear Predictive Coding : 

The technique known as Linear Predictive Coding (LPC) [ 19][20][2 1] is commonly 
used for speech compression. It is essentially an adaptive nth-order predictive method 
as described above. When LPC is used for speech compression, the difference signal 
does not need to be transmitted since it can be approximated by either white noise or 
a periodic pulse train. These can be generated at the receiver and applied to the 
appropriate prediction algorithm. This tremendous advantage does not exist, in 
general, for video images. Therefore, just as with other predictive techniques, the 
difference signal must be transmitted. Even so, LPC is an effective technique since 
the prediction is adaptively optimized for the local statistics of the image. 

Quantization : 

To transmit the difference signal, it must first be quantized. If the difference 
signal is quantized to the same precision as the original signal, the process is reversible 
as mentioned above. If the prediction is highly accurate, the entropy (information 
content) of the full-precision difference signal will result mostly from the less 
significant bits. This means using fewer bits to quantize the signal can significantly 
reduce the entropy without too much reduction in image quality. If some reduction 
in image quality is acceptable, much better compression can be obtained by reducing 
the number of quantization levels, thereby reducing the number of bits required to 
transmit the difference signal. The optimal distribution of the reduced number of 
quantization levels depends on the statistics of the image and on the relative 
significance to the user of various types of errors in the intensity values. 

To keep the maximum error below a certain threshold, the levels will be 
distributed uniformly over the possible values of the difference. To minimize the MSE, 
the smaller magnitudes will be quantized with more values. The larger magnitudes 
will be quantized with few values since they occur less frequently. Non-uniform 
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quantization is also appropriate for minimal subjective degradation of the image since 
the sensitivity of the human eye to small variations is much greater within smooth 
regions than near sharp edges. The signal can also be quantized to preserve certain 
statistical properties of the image [22]. 

Max [23] provided a method for determining the optimal distribution of 
quantization levels for a given distortion criterion when the statistics of the signal are 
known. He also provided solutions for 1 to 36 quantization levels which will minimize 
the MSE for a normal distribution. The Max quantizer has been widely used in both 
predictive and transform compression schemes. 

Quantization is implemented adaptively for a variety of purposes. These include 
better handling of sharp intensity gradations without sacrificing good quality in 
smooth regions [24]. Also, adaptive quantization adjusts to local statistics to reduce 
the local error, as well as insuring a more uniform output bit rate when entropy 
encoding is used on the difference signal. The adaptivity can be implemented either 
by changing the placement of quantization levels or by switching between quantizers 
using different numbers of levels. 

One difficulty encountered with DPCM is that errors in transmission will 
propagate through the image because predictions at the receiver will not be identical 
to those at the transmitter. A common solution to this problem is to reduce the 
magnitude of the predicted value by a few percent before calculating the difference 
signal. This is known as introducing ’leak" into the prediction. It will cause the effect 
of the error to diminish from pixel to pixel, thereby confining the effects to a small 
region. 

Aside from transmission errors, the information which is lost in using a predictive 
method results from the quantization error. The quality of the prediction affects the 
error only indirectly in the sense that a better prediction allows a finer quantization 
of the differences at the same bit rate, resulting in a better quality reproduction of the 
image. The number of quantization levels, as well as their placement can be adjusted 
to match various error criteria. 

Compression 

Good quality images can be obtained at bit rates of 3 - 4 bpp for nonadaptive 
DPCM and 2-3 bpp for adaptive DPCM [25]. Test pictures that were encoded with 
3-bit adaptive DPCM had signal-to-noise ratios of over 40 dB [26]. There have been 
claims of very good image quality at around 1 bpp [27]. The discrepancy may lie in 
the subjective determination of "good" images. Color images were reproduced with no 
perceptible degradation at a bit rate of 5 bits/sample where a composite PAL signal 
was sampled at 3 times the subcarrier frequency (20% over the Nyquist rate) [28]. 

Spatial Domain 

Most of the compression is achieved by taking advantage of the statistical 
distribution of the differences (most of them will be small in magnitude) through the 
use of coarse quantization. Using coarse quantization will result in quantization noise 
that can cause distortion in the reconstructed image. This distortion can generally be 
classified into three types: granular noise, slope overload, and edge busyness. 
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Granular noise refers to small variations in intensity in regions where the 
intensity should be constant. It results from not having enough quantization levels 
for very small differences. Slope overload refers to errors resulting from not having a 
large enough quantization level for the difference signal that occurs at a sharp edge. 
T his type of error will ten d to blur th e edges since it will take a few pixels for the 
reconstructed image to "catch up" to t he original i mage. It will also tend to shift the 
edges "downs tream" in the image since t he p redictions are causal which makes the 
difference signal lag the original signal. Edge busyness can result from not having 
enough middle-to-large quantization levels so that a continuous edge may appear 
discontinuous between consecutive pixel rows. 

The nature of the degradation of the image for a given bit rate can be controlled 
by the distribution of the quantization levels. Placing more levels at smaller difference 
values (reduce granularity) will improve the picture in regions of low detail at the cost 
of large errors at sharp edges. Many adaptive quantization schemes are designed to 
reduce the specific type of distortion that would be most critical in the local region. 
Adaptive 2-D prediction schemes have also been developed for the purpose of predicting 
along contours to reduce situations that would lead to slope overload and edge 
busyness. 

If the difference signal is subsequently entropy encoded, the amount of detail in 
an image can affect the number of quantization levels available at a given bit rate. If 
there is a great amount of detail in the image, a significant number of the differences 
will be large. This will raise the entropy of the difference signal, and, therefore, less 
compression will be possible for a given number of quantization levels. 

Temporal Domain 

Temporal effects will only be of interest for interframe DPCM. (See Motion 
Compensation) 

Aesthetic Appearance 

The appearance of the reconstructed image depends on the amount and 
distribution of the degradations mentioned above. The visibility of each type of 
degradation depends on its location in the image. Granular noise, for instance, is more 
visible in constant intensity regions of large area than it is near edges. 

One method that is used to reduce the visibility of quantization noise is to add 
pseudo-random noise (PRN) before quantizing the difference signal. The same noise 
is subtracted at the receiver. The effect of doing this is to break up any noticeable 
patterns in the quantization errors. 

Spectral Information 

The distortions produced by encoding color images were generally similar to those 
produced by using monochrome images. 
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Delta Modulation 

One technique, known as Delta Modulation (DM), involves quantizing the 
difference signal with one bit, i. e., two levels, one positive and one negative. The 
locations of the levels are usually determined adaptively in order to reasonably 
represent both smooth regions and edges [29][30][31]. Figure 2 shows a block diagram 
of a Delta Modulation encoder. 
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Figure 2: Delta Modulation encoder 

The DM encoder is similar to the DPCM encoder, where the quantizer uses two 
levels. The effects of adaptively changing the quantizing levels are illustrated in the 
two time plots in Figure 2, showing the response to a step change. In both plots the 
effects of slope overload can be seen. However, in the right-hand plot, using adaptive 
quantizing, the response arrives at the steady state level more rapidly. 

An advantage of DM over other forms of DPCM is that it does not require a 
digitizer. This makes it much simpler to implement if the input signal that the encoder 
receives is in analog form. 

Compression 

Nonadaptive DM produces 1 bit per sample. However, in order to maintain 
reasonable quality the analog signal must be sampled at a rate higher than the Nyquist 
rate. When the input signal is digital, adaptive DM can be used to improve image 
quality at a bit rate of 2 bpp. The image quality at a bit rate of 2 bpp is marginally 
acceptable. 
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Spatial Domain 

The coarse quantization used in DM results in quantization noise that can cause 
distortion in the reconstructed image. The distortion produces three types of noise 
similar to DPCM: granular noise, slope overload, and edge busyness. Adaptive DM 
will reduce the magnitude of these effects. 

Temporal Domain 

Temporal effects will only be of interest for interframe DPCM. (See Motion 
Compensation) 

Aesthetic Appearance 

DM is susceptible to transmission errors. The errors result in streaks, where the 
duration of the streaks is an exponential function of the leak introduced in the 
transmitter. Edge delay, edge wiggle and edge busyness are distortions characteristic 
of DM. The magnitude of these distortions increase with increasing compression. 

Spectral Information 

The distortions produced by encoding color images were generally similar to those 
produced by using monochrome images. 

Motion Compensation 

Motion compensation (MCJis a popular technique for improving interframe 
prediction. It can be used in combination with one of many intraframe compression 
methods. 

There are two general categories of MC techniques: pixel recursive algorithms 
and block matching algorithms [32]. In both of these techniques, the predictor uses 
the intensity value of a pixel in the previous frame which may be displaced spatially 
from the current pixel in order to compensate for the motion of the physical objects 
being depicted in the image. The goal of the algorithm is to find the spatial displacement 
that gives the best prediction. 

Pixel recursive algorithms update, at each pixel, the spatial displacement that 
is used in the prediction. The update is based on the difference value at the previous 
pixel of the current frame [33] [34]. 

Block matching algorithms compare small blocks of pixels in the new frame to 
displaced blocks in the previous frame, as shown in Figure 3. 
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SEARCH BLOCKS 



Figure 3: Motion Compensation block search 

The spatial displacement which gives the best match is used in the pixel-by-pixel 
prediction. Since most frame-to-frame displacements due to motion are of small 
magnitude, regions of the image containing motion can usually be predicted accurately 
by a MC technique with a reasonable number of calculations. If none of the comparisons 
result in a sufficiently good match, intraframe coding is used on the current block. 

Compression 

Motion compensated prediction can produce good quality pictures at an average 
rate of 1.5 bpp. 

Spatial Domain 

Granular noise in the region near edges of moving patterns is, in general, 
attributable to the use of MC. 

Temporal Domain 

Holding the output bit rate of an interframe predictive method, such as MC, at 
a given level could produce very low quality images when the amount of motion is high. 
In general, however, these techniques are designed to produce an output bit rate that 
is variable and depends on the amount of motion in the image. The quality of the 
image is maintained at or above a given level. The entropy of the difference signal for 
interframe methods will clearly be lower for sequences with less activity since the 
frame-to-frame predictions will be more accurate. 
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Aesthetic Appearance 

Motion compensated predictive methods attempt to predict image intensities 
accurately when there is some motion in a region. They will only be able to do so for 
slowly moving objects. In effect, they consider low-motion regions to be stationary, 
thereby increasing the fraction of the image that is stationary and reducing the bit 
rate. Just as interframe DPCM, moving edges result in large prediction differences, 
in turn resulting in slope overload. The aesthetic appearance however, is distributed 
from frame to frame. A scene change will be treated as high-speed motion 
(unpredictable) over most of the image. 

Spectral Information 

The distortions produced by encoding color images were generally similar to those 
produced by using monochrome images. Chrominance noise appears in the region of 
moving edges. 
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Block Methods 

Vector Quantization 

Vector Quantization (VQ) [35][36][37], which is also called block quantization or 
block source coding, uses a block of intensity values, treated as a vector. The block 
size is usually 4x4 since it is difficult to design a good representative codebook for 
larger blocks [38], The vector is compared to a codebook of vectors and the code for 
the "closest" (minimizing some measure of distortion) match is transmitted. This 
process is illustrated in the figure below using a block of 3 x 3 pixels. 



Figure 4: A Vector Quantization encoder using a block of 3 x 3 pixels 

It is the choice of the size and contents of the codebook that requires some 
ingenuity. The distortion measure, used for both the design of the codebook and for 
the comparisons during encoding of the image, should quantify the most critical type 
of distortion for the end user. The vector quantizer will then minimize this distortion 
measure [39]. 

The codebook consists of a small subset of all the possible vectors of intensity 
values. It is produced from the probability distribution of the image. Still more likely 
if the statistics of the image are not known a priori , the codebook is made from a string 
of training vectors which are assumed to be representative of the data to be transmitted. 
In the latter case, training vectors will produce less distortion if they are taken from 
the image itself. However, this requires an adaptive implementation involving the 
production and transmission of a new codebook for each image. A goal of the design 
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of vector quantizers is to design codebooks that are universal without being too large. 
A large codebook, while reducing distortion, also reduces the effectiveness of VQ in 
two ways. First, the bit rate is higher since the number of bits required for each 
codeword is log 2 (N) where N is the number of codevectors. Second, the time required 
to search the codebook increases rapidly with its size. 

In order to reduce the size of the codebook for a given distortion level, the mean 
and standard deviation are removed from each block and transmitted separately [40]. 
Another method that reduces the bit rate without increasing the distortion is called 
an Address-Vector Quantizer (A-VQ) [41]. For this method the codebook is divided 
into two parts. Most of the codevectors contain the codes (addresses) of four standard 
codevectors. To encode an image, each 4x4 block is vector quantized using the 
standard codevectors. Then the blocks are combined into groups of four. If the four 
codes match those contained in the address-codebook, this one code is substituted for 
the four standard codes. Otherwise, the four codes are transmitted as usual. In this 
way A-VQ can reduce the bit rate nearly 50% with no reduction in image q uali ty. 

Other methods have been developed to improve the subjective image quality for 
a given codebook size by segmenting the codebook and assigning more codevectors to 
"edge" vectors since the accurate reproduction of edges is necessary for good quality 
images. 

VQ can be used for multi- spectral images by treating the multiple intensity values 
at each pixel as a vector. The spatial and spectral aspects of a color image can also be 
combined into a single vector [42]. 

Compression 

Straightforward implementations of VQ (segmented codebook or mean 
subtraction) can produce good quality images at rates of 0.5 - 0.8 bpp for monochrome 
images and 1.5 - 2.0 bpp for color images. A motion compensated technique has been 
developed [43] that produces color images at 0. 1 - 0.2 bpp if the motion covers less than 
20% of the image. 

Spatial Domain 

The objective errors resulting from VQ depend on the distortion measure that is 
used. If the acceptable and unacceptable types of distortion can be quantified, a 
codebook can, in theory, be produced to meet these criteria. In practice, algorithms 
for producing vector codebooks have only been developed for a few distortion meas ures 
The MSE is about 0.1% for standard VQ. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

MSE does not accurately represent the perceived image quality. This can be seen 
at sharp edges where a "staircase effect" is produced by standard VQ [44]. The 
degradation of these edges does not significantly increase the MSE, but it does decrease 
the perceived quality. The number of possible edge orientations are too large to be 
include all the respective "edge" vectors in a code set. Therefore edges are degraded. 
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Another less significant effect which can occur is some blocking in smooth areas 
[45]. The techniques mentioned above which transmit the mean separately should 
eliminate most of this problem. 

Spectral Information 

The reports of using VQ for color images did not include an analysis of the effects 
on spectral information. 

Vector DPCM 

Vector DPCM [46] performs prediction on a block by block basis. The intensities 
at all the pixels in a block are predicted based on the intensities at the pixels adjacent 
to the left and upper borders of the block (the nearest pixels that have already been 
transmitted). The differences at all the pixels in the block are coded by vector 
quantization. 

Compression 

Very good reproduced images were obtained at 0.5 bpp for monochrome images 
and 0.75 - 1.08 bpp for color images with sub-sampling of the chrominance signals. 
The square error per pixel for these images were less than 18 for monochrome and 
less than 200 for color. 

The distortions produced by this method are similar to those produced by VQ. 
In general, Vector DPCM produces better results than VQ at the same bit rate. 

Block Truncation Coding 

Block Truncation Coding (BTC) [47][48] divides the image to be coded into blocks. 
The block size is usually 3 x 3 or 4 x 4 pixels. Within each block a threshold is chosen, 
and the value at each pixel is coded as a 0 or 1 depending on whether it is above or 
below the threshold. To decode the image, a high value is assigned to each pixel that 
has a 1, and a low value is assigned to the others. The most common techniques 
attempt to preserve the mean and variance of each block. They choose the mean as 
the threshold value. The low, a, and high, b, values at the decoder are computed as 
follows: 

a= ”-°V^ 

where T] is the mean and a is the standard deviation, m is the total number of pixels 
in the block and q is the number of pixels whose value is greater than the threshold. 

Interframe BTC can be implemented on 4x4x3 blocks of pixels if motion 
compensation is included. Whenever the motion is too great, the algorithm switches 
to 2-D BTC [49]. 
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BTC lends itself very nicely to parallel processing since the coding of each of the 
blocks is totally independent. Some parallel algorithms have been developed [50]. 

Compression 

Using 4x4 blocks will produce a bit rate of 1.625 bpp for a monochrome image, 
assuming that the mean and standard deviation together are transmitted with 10 bits. 
NTSC color images can be transmitted at 2.13 bpp by sub-sampling the I-component 
at 2:1 and the Q-component at 4:1 and using BTC for each component independently 
[51]. Interframe coding of 4 x 4 x 3 blocks can bring the bit rate down to 0.9 bpp. 

Spotted Domain 

Some of the reported distortion effects are edge raggedness and 
misrepresentation of mid-range values [52]. BTC also tends to produce artifacts of 
sharp, well-defined edges [53]. 

Temporal Domain 

The motion appeared to be continuous and clear for the 3-D BTC. 

Aesthetic Appearance 

In addition to the distortions reported above, BTC tends to enhance edges at the 
expense of secondary changes which are less perceptible. This is actually a subjective 
improvement in the image [54]. In general the images appeared sharp at the 
transmission rates reported above. 

Spectral Information 

When coding a color image, there are small areas that contain errors in the color. 
This is due to different amounts of distortion in the three components. However, the 
overall color quality is good. 

Variable-Resolution Coding 

Variable-resolution coding includes methods which use local contrast to 
adaptively vary the resolution of subareas. They do this by representing blocks of 
pixels in low detail regions with one intensity value, usually the mean of the intensity 
values in that block. We will discuss two variable-resolution coding techniques, 
Micro-adaptive Picture Sequencing (MAPS) and Tree Coding. The two tec hni ques rem 
be implemented to retain the same intensity information. They differ only in the 
details of the compression algorithm and the overhead information. They both can be 
implemented for reversible compression by requiring regions of identical intensity 
values, rather than just low contrast. 
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Micro-adaptive Picture Sequencing 

The MAPS [55][56] algorithm starts with the lowest level block, which is usually 
individual pixels, and attempts to replace four blocks at a given level with a block at 
the next level. If the contrasts among the four lower level blocks in the appropriate 
locations are below the contrast thresholds, the lower level blocks are combined into 
one larger block. MAPS includes an ordering of blocks where every lower level block : 
of a level n block is completely covered before the next block of the same level is begun. 
Practically , this means the position of the blocks of different sizes can be determined 
implicitly from the sizes of the blocks. The overhead information, therefore, includes 
only one number for each block to indicate its level. Normally, this number will be 2 
or 3 bits since the case of blocks larger than 128 x 128 pixels is unlikely to occur. Using 
only two bits will result in lower overhead, but will limit the block size to 8 x 8. r 

Tree Coding 

Tree coding [57][58] starts with the entire image. The image is divided into 
square subregions. Any subregion having low contrast is encoded with a single 
intensity value. A subregion having high contrast is divided further. This process is 
repeated down to the level of individual pixels. By processing the image in this fashion, 
the intensity information is contained in a tree data structure. Uniform subregions 
are leaf nodes of the tree, while sub-divided regi ons are interi or nodes. In addition to 
the intensity value of each region, information is transmitted to tell the receiver the 
shape of the tree. For intraframe compression the tree is usually a quad-tree. This 
means that each region is divided into four quadrants, and each interior node has four 
nodes at the level beneath it. In this case, the overhead for tree structure information 
is found to be about 14% on the average. For interframe compression oct-trees, eight 
quad-trees, are sometimes used. “-'f^ 

A disadvantage of these two methods of variable-resolution coding is that the 
possible positions of large blocks are fixed so that large uniform regions can be "missed" 
if they are not located and/or oriented properly with respect to the subdivisions of the 
image. 

Compression 

Tree coding generally achieves slightly more compression than MAPS since the 
overhead is lower (14% compared to 25-38%). The IEEE Facsimile Test Chart was 
digitized at 2048 x 2048 pixels and compressed to 0.593 bpp using MAPS. The MSE 
was 0.82%. At compressions of 0.5 - 2.0 bpp, MAPS produced MSE 25% to 50% lower 
than fixed block coding in which blocks of pixels are replaced by the mean without 
regard to contrast. If fine detail can be eliminated, some images can be encoded using 
tree coding at less than 0.2 bpp. An image with low detail can be reversibly encoded 
using tree coding at about 2 bpp, whereas a high detail image might need more than 
9 bpp (higher than PCM). 

Spatial Domain 

Information is lost in the fine detail of low contrast regions. The contrast 
thresholds are adjusted globally and/or locally to ensure that any important detail is 
retained. 
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Temporal Domain 
Not Applicable. 

Aesthetic Appearance 

The appropriateness of variable-resolution coding is based on the idea that "when 
an image is viewed as a whole, fine detail is noticed only when it exhibits sharp contrast" 
[59]. Therefore, the loss of detail should not be noticeable if the thresholds are chosen 
properly. The only visible artifact is blockiness when large block sizes are used. 

Spectral Information 
Not Applicable. 
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Human Visual System Compensation 

Human Visual System (HVS) compensation techniques attempt to compress 
video images by eliminating any data not perceptible to the human visual system, 
even if they are important from an information theory point of view. Some techniques 
apply a model of the HVS directly to the image data. Many different models have been 
developed to represent as many features of the HVS as possible. They generally include 
linear filters, as well as a non-linear element. Once a model has been chosen, a 
compression technique can be developed that, when applied to the output of the model, 
’loses" information that would be lost by the HVS anyway, e. g., high-frequency signals 
that would be filtered out. At the receiver, the image is restored by applying the inverse 
of the HVS model. 

Other techniques involve taking advantage of specific characteristics of the HVS 
without attempting to explicitly model it. When measured subjectively, the accurate 
reproduction of edges is very important for the reproduction of images of high quality. 
Therefore, one technique concentrates the transmitted data on the reproduction of 
edges at the expense of other features. Such techniques should attain higher 
compression while yielding better quality reconstructed images. 

Many of the HVS compensation techniques involve extracting edge information 
from the video signal. Many methods have been developed for this purpose. Some 
involve separating the contour (high spatial frequency) portion of the video signal from 
the texture (low spatial frequency) portion by linear filtering in the frequency domain. 
Other techniques attempt to categorize each pixel as "edge" or "not edge". This 
categorization are done either by applying local derivative operators and thresholding 
the results or by matching small regions of the image to templates of standard edge 
configurations. These techniques are generally difficult to investigate analytically, 
flnH they are compared by the results they reproduce on test images. The application 
of some of these methods to data compression are addressed in the discussion of the 
compression techniques in this section. 

A variety of HVS techniques are described here. It would seem many of these 
techniques consist independent steps which can be mixed and matched carefully. 
Specifically, the filtering and/or edge detection algorithms are somewhat 
interchangeable between different techniques. This leads to the possibility of a 
compression technique tailored to a specific application. 

Synthetic Highs 

The method of synthetic highs [60][61] involves the separation of the video signal 
into two components, a low spatial-frequency component and an edge component. The 
low-frequency component is extracted by a low-pass filter. This component can be 
sampled at a lower rate than the original signal without loss of information. The edge 
information is extracted by applying a differential operator to the original signal. The 
result is compared to a threshold, and only the location and value of the "important 
edge pixels are transmitted. This threshold comparison is the only information-lossy 
process in this technique. At the receiver the low-frequency component is interpolated 
to the f ull resolution. A reconstruction filter is used to synthesize a high-frequency 
signal from the edge information. The two components are then combined to 
reconstruct the video image. 
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Compression 

The value of the threshold used for the edge detection provides a trade- off between 
compression ratio and image quality. High quality images were obtained with 
compression ratios ranging from 4 to 23 depending on the image. 

Spatial Domain 

The only information lost is the portion of the high-frequency signal below the 
threshold. This can result in a loss of texture in the reconstructed image. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

The images appear sharp. However, some texture may be missing, as stated 
above. 

Spectral Information 

Not Applicable. 

Pyramid Coding 

Pyramid coding [62] involves separating the video signal into multiple frequency 
bands. Compression is achieved because the high-frequency signals have low entropy 
and are encoded using fewer bits per pixel, and the low-frequency components are 
transmitted using fewer pixels. 

The HVSis less sensitive to contrast errors at high spatial frequencies. Therefore, 
the highest-frequency components which contain the largest number of pixels can be 
quantized with the least levels, thereby significantly reducing the entropy and the bit 
rate. The lower frequency components require more bits per pixel, but the number of 
pixels is so much smaller that the effect on the overall bit rate is insignificant. 

The set of frequency band components i s kn own as a pyramid because each band 
contains fewer samples Sian the previous higher-frequency band. Each level of the 
pyramid is extracted by low-pass filtering the image and subtracting the low-pass 
component. The remainder is the current level of the pyramid, while the low-pass 
component is used as the starting point for the next iteration in which the filter has 
a lower cutoff frequency. 

To produce the pyramid at high speed, the filtering results from convolution of 
a weighting function with the image. (This is actually a non-causal form of prediction.) 
The sample rate is reduced from level to level, typically by a factor of two in each 
dimension. 

Each level is encoded independently. Significant compression usually is achieved 
at tiie cost of loss of information by quantizing the components more coarsely than the 
original signal. The number of quantization levels for each component are chosen so 
as to have little effect on the subjective appearance of the image. 
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At the receiver the low-pass components are interpolated back to full resolution 
and all of the components are added together to reconstruct the image. This method 
lends itself nicely to progressive transmission schemes since most of the transmitted 
bits are for the high-frequency components, and a blurred version of the image can be 
transmitted at a much lower bit rate. 

Compression 

Data rates from 0.7 to 1.6 bpp were reported with normalized MSE of less than 

1 %. 


Spatial Domain 

The only errors result from the quantization noise added to each frequency 
component. The amount of noise in each frequency band is controlled by the number 
of quantization levels used. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

Most of the quantization noise occurs in the high spatial frequency ranges where 
the human visual system is less sensitive. Therefore, the subjective quality of the 
reconstructed images is very high. 

Spectral Information 

Not Applicable. 

Region Growing 

Region growing techniques [63] [64] also separate the contour information from 
the texture information. The pixels of the image are first divided into regions. The 
boundaries of the image are defined as contours. The contours and the texture 
information are transmitted separately and combined at the receiver. 

A region is defined by a property common to all the pixels within it. The ideal 
property insures the boundaries of the regions correspond to physical boundaries of 
the objects being imaged. In practice, however, this is not achievable. Instead, the 
property is usually a range within which all the intensity values must lie. 

The regions are grown by starting with one pixel and adding all surrounding 
pixels having the correct property. When no more pixels can be added to a region, a 
new region is started. This procedure continues until every pixel belongs to a region. 
(Some of the regions may contain only one pixel.) The pixels on the boundary of a 
region are treated as contour pixels. The image is processed to reduce the width of 
the contours from two pixels to one by putting some of the contour pixels back into the 
interior of the regions. To reduce the complexity, small regions are merged into larger 
ones and regions which are similar along their common boundary can be combined. 
These operations will most likely not eliminate contours having physical significance. 
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The contours are coded using a contour coding scheme (see Contour Coding under 
Reversible Image Compression). The remaining texture information is coded by 
1 1 sing an approximating function for each region. Since sharp discontinuities will not 
appear within a region, low-order polynomials are usually reasonable approximations. 

Compression 

Good reproductions of an image can be obtained at a compression ratio of about 
30:1. Higher compression can be achieved if the image is composed of a small number 
of large regions of near-uniform intensity. 

Spatial Domain 

It is assumed that very small regions do not represent physical objects. If they 
do, these small objects and/or fine details may be lost. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

The reconstructed picture has sharp edges and the proper texture. Degradation 
will be obvious if the original image contained much fine detail which is subjectively 
significant, e. g., facial features on people at a distance. 

Spectral Information 

Not Applicable. 

Directional Decomposition 

Directional Decomposition [65] is based on evidence the HVS contains direction 
sensitive cells which extract features at specific orientations. Imitation of this 
directional decomposition should not introduce large subjective errors. The 
compression technique involves decomposing the image into a low-pass component 
and N (typi call y 16) components which are high-pass in a specific direction and 
low-pass in all the other directions. The decomposition is done in the 2-D Fourier 
domain by multiplying the transform of the image with directional filter transfer 
functions. Each component is then inverse transformed to the spatial domain. 

As in the other techniques in this section, the low-pass component is 
under-sampled and transmitted. The directional high-pass components are used for 
edge detection. The edge detection is performed most effectively by producing an 
isotropic high-pass image consisting of a combination of the directional components. 
The pixels are compared to a threshold and selected to retain only the "important" 
edges. Every edge pixel found in the isotropic image is then classified into one of the 
directions based on a comparison of the form of the signal in the vicinity of that pixel 
in each directional component. 
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The edge information of each directional component is encoded independently. 
The location of the edge points can be encoded by sub-sampling along the edge since 
each component is low-pass along the direction of the edges. This additional 
compression is the advantage of directional decomposition. Some information about 
the profile of the edge must also be coded and transmitted. This can be done by 
modelling the profile with a simple analytic function and transmitting some 
parameters of the function. 

Since the directional filters used for decomposition are not perfectly sharp, some 
edges will appear in more than one directional component. This redundancy can be 
reduced by using a form of prediction between components. 

At the receiver the low-pass component is interpolated to full resolution. Each 
of the high-pass directional components is synthesized from the edge location and 
profile information. All of the components are combined to form the reconstructed 
image. The relative weight of the high-frequency portion of the image can be adjusted 
to control the sharpness of the edges. 

Compression 

Reasonable quality images can be obtained at compression ratios up to 60:1 for 
8-bit originals. Somewhat blurred, but still easily recognizable, images can be obtained 
at compression ratios in the range of 90:1 to 120:1. 

Spatial Domain 

Since both the high and low spatial frequency components are compressed, 
degradations can result from either or both of these components. Loss of detail is the 
major degradation. The amount of lost detail increases with the compression ratio 
since the thresholds are increased in order to obtain more compression. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

By assigning a large weight to the high-frequency component, the image can be 
made sharp even at high compression. However, the loss of detail will still be 
noticeable. 

Spectral Information 

Not Applicable. 
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Anisotropic Nonstationary Predictive Coding 

Anisotropic Nonstationary Predictive Coding [66] is basically a predictive 
technique. The prediction uses a combination of a low-pass filter, a high-pass filter, 
nnH an anisotropic filter to enhance edges. These filters are weighted with 
non-stationary weighting functions and linearly combined to form the actual prediction 
filter. In one scheme, the difference signal is coded using a Discrete Cosine Transform 
on each row. In addition to the difference signal, the weighting functions must also 
be transmitted - These functions are low-pass so they are under-sampled at 1:6 and 
quantized fairly coarsely. The high degree of adaptivity of the predictor to local 
anisotropies allows the difference signal to also be encoded with few bits. 

Compression 

Good q uali fy images can be obtained with compression ratios as high as 30:1 
from 8-bit originals, although 20:1 is more typical. 

Special Domain 

This technique does not handle well fine texture at low bit rates . However , if the 
texture is not important, it can be filtered out before encoding. Otherwise, the major 
degradation is wide band quantization noise. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

The reproduced edges are sharp and coarse texture is reproduced well. The visible 
errors are in highly detailed regions. Without close inspection, these errors are not 
highly visible. 

Spectral Information 

Not Applicable. 
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Minimum Noise Visibility Coding 

One effective way to take advantage of HVS characteristics is to "hide" the 
quantization noise by moving it to areas where it is less visible. As a result, lower bit 
rates can be used to achieve comparable quality images. Minimum Noise Visibility 
Coding (MNVC) uses two effects to hide the noise. One is that noise is more visible in 
darker areas than in lighter areas. However, there exists a scale, called the 'lightness" 
scale, on which equal increments are equally visible. Therefore, the luminance values 
are transformed to the ’lightness" scale prior to quantization in order to distribute the 
noise evenly over the intensity range in terms of visibility. The other effect is less 
visibility of noise in areas of high detail, e. g., near high-contrast edges. Figure 5 
illustrates the MNVC concept. 



Figure 5: A Minimum Noise Visibility Coding encoder 

As with many other HVS compensation techniques, MNVC [67] involves 
separating the high and low spatial frequency components. The low frequency 
component is sub-sampled at 1:5 and transmitted with the full 8-bit representation. 
The high frequency component is quantized at a much lower bit rate per pixel. First, 
however, pseudo-random noise (PRN) is added in order to eliminate the correlation 
that would exist in normal quantization noise. Then a tapered quantizer is used to 
place more error in high detail areas in order to take advantage of the property of the 
HVS mentioned above. 

Color images are encoded in the YIQ format. The Y (luminance) signal is encoded 
identically to a monochrome signal. The I and Q (chrominance) signals are sub-sampled 
in the same manner as the low frequency luminance component. 

At the receiver the PRN is subtracted from the high frequency component. The 
low frequency component is interpolated to full resolution, and the two components 
are combined to produce the reconstructed image. 
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Compression 

Using standard television resolution, encoding the high frequency luminance 
component at 4 bpp produced images subjectively as good as the original 8 bpp signal. 
The low frequency and chrominance components each use 0.53 bpp, so the overall 
result was 5.6 bpp for a color signal and 4.5 bpp for mon ochrome. MNVC is a 
low-compression technique where the emphasis is on invisibility of the quantization 
noise when the bit rate is reduced. It was found to produce subjectively less degradation 
than DPCM for a given bit rate. 

Spatial Domain 

The degradations involve quantization noise in the high frequency component 
and under-sampling in the low frequency component. The latter is not significant due 
to the sampling theorem. 

Temporal Domain 

The PRN that is added to the high frequency signal can produce a "dirty window" 
effect if it is synchronized to the frame rate. Using totally unsynchronized PRN makes 
the image appear more noisy. This trade-off has to be optimized. 

Aesthetic Appearance 

The number of bits per pixel mentioned above yields a reconstructed image whose 
noise is essentially invisible to a human observer. As the number of bits per pixel 
decreases, results similar to those from DPCM become likely. 

Spectral Information 

Under-sampling the I- and Q-components at 5:1 had little effect on the overall 
quality of the color image. 

Constant Area Quantization 

Constant Area Quantization (CAQ) [68][69][70] is a predictive technique similar 
to Delta Modulation. Unlike DM, however, the difference is compared to a threshold; 
and if the difference is small enough, a 0 is transmitted. If the magnitude of the signal 
is above threshold, either a P or N is produced to indicate positive or negative change. 
The technique is called Constant Area Quantization because the threshold, as well as 
the positive and negative quantization levels, are adjusted to keep the area, A, (distance 
x luminance) under the triangle from the previous P or N to the current one constant. 
This is equivalent to setting the threshold equal to Ain where n is the number of pixels 
since the previous P or N was produced. This scheme provides high resolution for high 
contrast regions and high compression for low contrast regions. The motivation for 
CAQ is the property of human vision where the eye sees more detail in high contrast 
regions. 
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A number of modifications have been made to CAQ in order to reduce the error 
and/or the bit rate. One modification makes the area threshold adaptive to the detail 
of the image. Another possibility is to have a 2-D predictor instead of the using the 
value at the last P or N as the prediction. This is called Predictive CAQ (PCAQ). The 
entropy of the output reduces 30% by providing a better prediction, thereby reducing 
the number of Ps and Ns. A third possibility for reducing the error incorporates 
overshoot into the scheme, i. e., to make the quantization levels larger than the 
thresholds. This allows the reconstructed signal to follow the original signal more 
closely. Combining a Hadamard transform in the perpendicular (vertical) direction 
with the basic CAQ will also reduce the bit rate by taking advantage of the correlation 
in that direction also. 

Some of the great advantages of the basic CAQ are its minimal complexity, power 
consumption, and cost. The method was originally designed for a remotely piloted 
vehicle where these factors are critical. Adding overshoot does not increase the 
complexity, whereas improving the predictor does. However, with the improvements 
in hardware in the last decade, all of the variations considered above should be practical 
even under severe power and cost constraints. 

Compression 

The basic CAQ will produce an output signal having a maximum zeroth-order 
entropy of 1.58 (=log 2 3) bpp. Typically, the entropy is about 1.1 bpp since 0 is more 
common than P or N. Huffinan coding is used to approach the actual entropy values. 
When the entropy is near or below 1 bpp, Huffinan coding should be implemented on 
blocks of 2 or more pixels in order to include code symbols that require less than 1 bpp. 
Using a high detail test image, the MSE was about 3% at 1.08 bpp. Introducing 
overshoot produced an MSE of less than 2% at about 1.2 bpp. PCAQ resulted in an 
MSE of about 1% at 1.3 bpp and about 1.5% at 1.0 bpp, using the same test image. 

Spatial Domain 

Basic CAQ has the same type of problems as DM, i. e., not being able to handle 
both large slopes and low contrast regions. Also, the reconstructed signal always lags 
the original and contains blurring of edges. Large slopes and low contrast regions are 
reduced somewhat by using PCAQ to provide a better prediction or by making the 
method adaptive. The lag and blur are relieved by using overshoot. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

The images produced by the basic CAQ appear blurry. However, it is still possible 
to use the images for object recognition. PCAQ produced a much clearer image. 

Spectral Information 

Not Applicable. 
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Perceptual Space Coding 

A method which minimizes the perception of distortion takes advantage of the 
HVS model. Perceptual Space Coding [71][72][73] attempts to minimize the distortion 
in a "perceptual" space, i. e., in a domain obtained by passing the image through the 
model of the HVS. 

A commonly used model of the HVS involves a linear transformation of the input 
color components (RGB, YIQ, etc.) into a luminance component and the two color 
components that correspond to the types of cones in the retina. Each component passes 
through a logarithmic compression in an attempt to model the neural response of the 
receptors of the retina. The next step in the HVS involves what are known as 
color-opponent cells. The action of these cells is modelled by two weighted linear 
difference circuits which subtract weighted multiples of the chrominance signals from 
the luminance signal to produce the perceptual chromatic information. Each 
perceptual component is then passed through a linear band-pass filter which replicates 
the lateral i nhib ition mechanism of the ganglion cells. 

Using this model, the distortion criterion is the MSE in the perceptual space at 
the output of the system. To compress the image, it is first passed through the system, 
i. e., converted to the perceptual space. The linear filter is implemented by 
multiplication in the spatial frequency domain. The Fourier coefficients, at the filter’s 
output, are then quantized with fewer bits than the original image using a method 
which minimizes the MSE. The bit allocation for the Fourier coefficients is determined 
by the power spectra of all three components so that more bits are used for the 
coefficients that contain more power. 

At the receiver the inverse of the HVS model is applied. The reconstructed image 
may have a relatively high MSE in the image domain. However, the errors are "hidden" 
so that they are not highly noticeable. Therefore, the subjective image quality is much 
higher than other images with the same MSE. 

If the image is monochrome, only the logarithmic compression and bandpass filter 
steps of the HVS model are used. Otherwise, the method remains the same. 

Compression 

A low detail 512 x 512 monochrome image was encoded at 0.1 bpp with an MSE 
in the image domain of 0.72%. This image was of usable quality. A number of color 
images of varying detail were encoded at 0.25 bpp with MSEs ranging from 0.36% to 
3.3%. These images were reported to be of excellent quality. 

Spatial Domain 

The errors are due to quantization errors in the spatial frequency domain. Since 
the transform is done on the entire image, no blockiness should occur. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

The degradation of quality with reduced bit rates was reported to be "graceful". 
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Spectral Information 

No specific information was reported. 
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Transform Coding 

Transform codingis an information lossy technique which uses a mathematical 
operator on the data representing a digital image. The input data are typically highly 
correlated, so the goal of transform coding is to derive an array of uncorrelated or 
nearly unc orrelated data from the input data. A typical transform coding scheme is 
shown below. 


f(x,y) F(u,v) 



Figure 6: The process of Transform Coding 

The mathematical operators used in transform coding form a complete 
orthogonal set of basis vectors. A complete set of basis vectors allows the original data 
to be described as a linear combination of all basis vectors in the set. Orthogonality 
implies, in a given set of basis vectors, any one basis vector cannot be represented as 
a linear combination of the other basis vectors. In other words, each of the basis vectors 
is unique. The Fourier transform is a very familiar transform whose basis vectors are 

complex exponentials. — - 

A block of digital image data, f(x,y), in an M x N array can be expressed as a 
newM x N array, F(u , v), via a two-dimensional transform according to the relationship 


M-1JV-1 

F(h,v)= X X f(*,y)<L,(*»:y) 

x»0 y =0 

where <J> w (x,y) are a set of orthogonal basis vectors. 

The imag e data are fully recoverable using the inverse transform 

f(x,y) = X i f(m,v)i|J«,v) 

u = 0 v = 0 

again, where $£(u,v) represents Hie transform kernel or basis vectors for the image 
space. 

It is common to think of transform coefficients as representing a "frequency 
domain", althou gh this term is truly accurate only in the case of the Fourier transform. 
Some authors have coined the word "sequency" to describe the pseudo-frequency 
domain behavior of the other transforms. 
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A transform acts to "pack" a large number of highly correlated image data samples 
into a smaller number of uncorrelated coefficients. If the image’s information is 
analogous to the energy stored in a mechanical system, then transforms pack diffuse 
energy into more compact energy "packets". The majority of the energy, the average, 
is packed into the first "packet". In discussing images, this first value is known as the 
DC coefficient. This term represents the average intensity, or value, of the pixels in 
the image block. The remaining terms are known as AC coefficients. The amount of 
the image’s information in each packet, or the amount of correlation between pixels, 
decreases as the order of the coefficients increases. For most transform kernels, this 
order is the "sequency" order. Many bit allocation and quantization schemes 
(digitization) act to reduce or filter this high "sequency" data. The effect is analogous 
to frequency filtering in electrical and electronic systems. 

In broadcasting and other types of electronic communication, a technique known 
as pre-emphasis/de-emphasis is employed to overcome deleterious effects ofbandwidth 
compression. This electronic processing is proven beneficial in image processing. 
Electronic pre-emphasis is added to the analog image signal prior to digitization. At 
the receiver, electronic de-emphasis is added after the digital, reconstructed image is 
converted to the analog domain. In addition, an optical filtering process is employed 
with spatial bandpass filtering prior to the video camera’s focal plane. The section on 
optical implementations of VDC in HHVT discuss this second method. 

Both techniques help to reduce the blockiness and edge busyness associated with 
transform methods. They contribute very little to implementation complexity, while 
increasing the effectiveness of image compression. A technique such as these should 
be an integral part of any image compression or processing scheme. 

The transform is a reversible process until quantization of the coefficients and 
no data compression occur. Both of these processes will introduce distortion. Also, 
usually M = N. 

Quantization of coefficients can be performed using a uniform quantizer, an 
optimal quantizer, a compander/expander or by adaptive methods. The quantization 
process is the key step in a transform coding algorithm since it is here data compression 
is performed and output image fidelity is affected. F(«,v), known as the coefficient 
array, and f(x,y), the original image block, are both N x N arrays with n bits per array 
element or N 2 x n bits total. Data compression of the F(m , v) array is achieved by reducing 
the number of bits used to quantize some of the coefficients. A common feature of the 
various transform coding algorithms is many of the coefficients will be of very small 
magnitude. These coefficients are thus coarsely quantized or even omitted, with 
negligible effect on image quality. 

This process will result in a compressed coefficient array F / (m,v) which may be 
further compressed by entropy coding the coefficients. The result will be transmitted 
over the communication link with some type of error correcting code. At the receiver, 
decoding followed by the inverse transformation (and de-emphasis) will be performed 
on the F'(m,v) array, resulting in f(x,y), the output image. 

The following sections describe the various parameters of transform coding 
systems. It is assumed we are dealing with square images, e. g., 512 x 512 or 
1024 x 1024 pixels. 
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Block Size and Dimensionality 

Transforms are performed on line segments of length N of an image (one 
dimensional), on N x N blocks of the image (two dimensional), or on N x N x M blocks 
of an image sequence (three d imensional ). Often, the third dimension is time. One 
and two dimensional transforms are intraframe coding techniques, whereas three 
dimensional transforms are interframe coding techniques. 

One dimensionaTtransforms are usually performed on the horizontal lines of an 
image since data are often raster scanned horizontally. A whole line or a segment of 
a line may be used. One dimensional transform coding has advantages over 
multidimensional transform coding of speed and implementation simplicity. However, 
one dimensional transforms are inefficient since they only take advantage of spatial 
correlation of an image in the horizontal direction. Thus, under the same image fidelity 
criteria, an one dimensional transform coder will not be able to achieve as high a 
compression ratio as the multidimensional transforms. 

In practice, one dimensional transform coding is seldom implemented by itself. 
A common technique uses transform coding in the horizontal direction and predictive 
coding in the vertical and temporal directions. 

Two dimensional transform coding usually begins by subdividing the whole image 
into N x N sub-blocks. Transform coding is then performed on the sub-blocks. Two 
dimensional transforms take full advantage of intraframe correlation in both 
directions, unlike one dimensional transforms. They are not as fast since N lines of 
the image must first be stored before the transform process can begin, and they are 
somewhat more complex to implement. Usually, the increased coding efficiency more 
than offsets the other factors. 

The choice of block size (the choice of N) is an important parameter in two 
dimensional transform coding schemes. For simplicity, N is always chosen as an 
integer power of 2. Large and small values of N each have unique advantages and 
disadvantages. Transform coding schemes with large values of N perform better under 
a given fidelity criterion at high compression ratios [74]. These schemes take 
advantage of correlation over a larger area. Those with smaller values of N are faster, 
easier to implement and more receptive to adaptivity. One author [75] suggests the 
intraframe correlation for most images is negligible beyond a spatial distance of 20 
pixels and recommends smaller blocks. 4x4, 8x8, 16 x 16, 32 x 32 and 64 x 64 
transform coding schemes have been implemented in cases appearing in the literature. 

Three dimensional transform coding schemes subdivide a sequence of image 
frames into N x N x M blocks. They are the most efficient of transform coding schemes 
since they fully exploit correlation in the spatial and temporal directions. However, 
they are high in implementation complexity. Also, they are much slower since M-l 
frames plus N lines of the image sequence must be stored before the transform coding 
can start. This decreases their attractiveness for use in real-time systems. 

To minimize complexity, small values of N and M are chosen. Research indicates 
higher values of M (i.e., coding more frames at a time) result in a lower MSE [76], but 
choosing M = N allows for vector processing to speed the address evaluations of three 
dimensional arrays [77]. Block sizes of 4 x 4 x 4 and 8x8x4 and 4x4x8 are most 
common in the literature [78]. 
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Quantization and Bit Allocation 

Once the array of transform coefficients is determined, the next task is to quantize 
them for storage or transmission. Data compression is achieved by deleting low 
magnitude coefficients and coarsely quantizing. Coarse quantization assigns fewer 
bits to these coefficients. There are several methods for accomplishing this. 

One method is known as threshold sampling or magnitude sampling. By this 
method, all coefficients above a certain magnitude are retained while those below the 
threshold are deleted. This results in an address set, F(m,v) given by 

F(w,v) = {F(w,v): |F(w,v)|>T}, 

where T is the predetermined threshold. 

Extra bits are required to address the transmitted coefficients [79], so magnitude 
sampling requires overhead. However, magnitude sampling adapts to the local 
statistics of the image, and thus performs well. The transmitter’s coding operation is 
independent of the receiver’s decoding operation. Also, this overhead is often much 
less than the overhead required to transmit a codebook in other block based methods. 

Another quantization method, zonal sampling, uses on the variance of the 
transform coefficients. The variance of the coefficients decreases for the higher 
sequency coefficients, and the coefficients with the largest variances contribute most 
to the reconstructed image [80]. Zonal sampling assumes a given coefficient’s variance 
will be constant for a given class of images, even though its amplitude will fluctuate. 
This allocation scheme relies on average statistics across a class of images. However, 
it is suboptimal since it does not adapt to the local statistics of each individual image 
as was the case with threshold sampling. The address set, F'(u.v), for zonal sampling 
is obtained by 

F(k,v) = {F(u,v): < v >V T }, 

where V x is the minimum variance with which a coefficient is retained. 

In addition, since zonal sampling is based on average statistics of a class of images, 
a bit allocation map may be developed a priori which results in coarser quantization 
based on smaller variance (dynamic range). The DC coefficient (representing the 
average value) has the largest variance and will be quantized with a full eight bits, 
but other coefficients will receive fewer and fewer bits based on smaller and smaller 
variances. These zones, within which coefficients share the same number of bits, may 
be determined empirically or using a numerical fidelity criterion such as 


where n b (u,v) is the number of bits assigned to coefficient F(w, v), trunc represents the 

truncation operator, and D is the maximum distortion penalty of quantizing with n b 

bits [81]. The result of a zonal sampling algorithm is a bit allocation map such as the 
one shown below in Figure 7. 

8765433221110000 
7665433221110000 
6554433221110000 
5544433322111000 
4443333222111000 
3333322222111100 
3333322221111100 
2223222111111000 
2222 322111111100 
2222221111111000 
1111111111100000 
1111111111100000 
1111111111100000 
1111111111000000 
1111111111000000 
111111111100000 0 

Figure 7: Bit Allocation Map from Zonal Sampling 

Once the coefficients and their respective bit allocations are known, quantization 
is performed. This can be accomplished with either a uniform or nonuniform quantizer. 
Statistically, it has been shown the DC coefficient is best approximated by a Gaussian 
distribution and the remaining coefficients are best approximated by a Laplacian 
distribution [82]. Therefore, optimally, the DC coefficient would be quantized with a 
Gaussian quantizer and the remaining coefficients with a Laplacian quantizer. 

The quantization and bit allocation process results in data compression, but this 
process also introduces distortion in the reconstructed image, f (x, y ). The mean square 
error resulting from quantization is given by 


f? = E{ ’X"i‘[f(x,y)-f(x,y)] 2 } 

x=0y -0 

where E represents the expectation operator. 

For the same mean square error, zonal sampling results in reconstructed images 
more objectionable to subjective human observers than threshold sampling [83]. The 
quantization noise from zonal sampling is more noticeable than the low pass filtering 
from threshold sampling. This implies a complexity vs. performance trade-off, which 
may be optimized with a hybrid sampling technique [84]. 

Color Signal Transform Coding 

Color images may be transform coded by coding the individual color components 
separately or coding the composite signal. For digital video systems, component coding 
is more desirable [85]. Even though the component system conversions may cause 
additional degradation, the better coding efficiency usually compensates for this [86]. 

YIQ co-ordinate conversion provides almost as high an energy compactness for 
color images as does the Karhunen-Loeve Transform (KLT) color co-ordinate 
conversion [87]. By its definition, the KLT (or principle components) produces the 
most uncorrelated, and therefore optimal, transform coefficients. 
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For component coding, the YIQ signals are available after a component conversion 
process. One wishes to transform code with this component system, since these three 
signals are nearly uncorrelated (unlike the RGB signals which are quite correlated) 
and since monochrome systems will only require the encoded Y component. This will 
result in the following three coefficient arrays: Fy(M,v), F^m.v), and F q (m,v) derived 

from the three component signal arrays f Y (x,y), fi(*,y), and f Q (x,;y) which represent a 

digitized sub-block of the image. This information permits quantizing the I and Q 
signals more coarsely (with fewer bits) than the Y signal. This achieves additional 
data compression. 



Classification Map for Y 


43444341 

43444211 

43444111 

43441212 

43441222 

44443123 

44444111 

44444111 

44444211 

44444211 

34444311 

22332231 

22213434 

23123443 

44224333 

44224331 


11242133 
2 11112 2 2 
11211223 
33312323 
23112234 
22212333 
23312233 
13113233 
22123233 
22222323 
33122233 
23431133 
34432113 
31134312 
21111331 
11111141 


Classification Map for I 


44444433 
44444312 
44443122 
44442322 
44442222 
44443222 
44444111 
44444122 
44444122 
4 4 4 4 4 1 1 1 

2 4 4 4 3 2 1 1 

114 4 1111 
11212312 
12211221 
44312111 

3 4 3 1 1 1 1 1 


23332233 
23222222 
22321233 
33222233 
2 3222333 
23322333 
12323334 
22223334 
23133334 
23233333 
33113333 
1111113 3 
11211333 
1 1 1 2 4 4 3 3 
11111443 
11111144 


Classification Map for Q 


4 4 3 4 4 

4 4 3 4 3 

3 3 3 4 4 

4 3 3 3 3 

4 3 3 4 3 

4 3 3 4 3 

4 4 3 4 3 

4 4 4 4 4 

4 4 4 4 4 

4 4 4 4 4 

4 4 4 4 3 

3 4 4 4 1 

4 3 2 1 2 

4 3 111 
4 4 2 1 2 

4 4 3 1 1 


4 4 3 2 2 
4 2 12 2 

2 12 2 3 

3 3 4 4 4 
3 3 3 3 4 
3 3 3 3 3 
2 112 2 
112 2 2 
112 2 3 
1113 3 

2 114 4 
11111 

3 12 11 
3 2 111 
11111 
11111 


4 4 4 2 2 2 

2 3 3 2 2 2 

3 2 3 2 2 2 

4 2 2 2 2 2 

3 3 2 2 3 2 

4 3 2 3 3 3 
4 2 2 3 3 3 
2 2 2 3 3 3 
1 2 2 2 3 3 

1 2 2 3 3 2 
1 1 2 2 2 2 
11112 2 

2 11112 
1 1 4 4 2 2 
1113 4 1 
11114 2 


Figure 8: Typical YIQ component coding scheme, with bit allocation tables based on zonal sampling. 
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Adaptivity 

Adaptive transform coding schemes compensate for statistically non-stationary 
ima ges by changing quantizationleyels or bit allocations. Optimal adaptive methods 
use changes in local statistics. Adaptive techniques tend to improve coding efficiency 
at the expense of raising complexity. A number of simple adaptive and quasi-adaptive 
techniques have been developed. This report investiga te s so me of them. 

The most common adaptive technique uses coarse coefficient bit allocations in 
regions of the image where detail (i. e, high sequency coefficie nt contribution) is low 
and finer quantization in regions of high detail. A drawback which is immediately 
apparent is the problem of informing the receiver of the quantization scheme. 

An effective way to overcome this problem is the technique known as class 
adaptive transform coding. Depending on the sum of the magnitudes or variances of 
the coefficients (known as the activity index), a particular coefficient array is classified 
into one of K distinct classes. K = 4 is by far the most common [88]. This requires 
only a log 2 M bit overhead per block to identify of which class that particular block is 
a member. The added complexity is moderate and the coding efficiency at a given 
fidelity criterion can improve significantly, depending on how stationary the image’s 
statistics are. Figure 9 shows the bit allocations for the four classes for the luminance 
component of one such coding scheme. Note that since different numbers of bits are 
used to code different blocks, there must be an output buffer to allow a constant output 
bit rate [89]. 

Class adaptive transform coding based on coefficient variances is more efficient 
than coding based on coefficient magnitudes [90]. We will describe two techniques 
here. 

With the first variance-based technique, a procedure known as recursive 
block quantization is used to determine coefficient variances. The coefficients 
are placed into a one-dimensional array using the ordering procedure shown in 
Figure 10 — F(m,v) becomes F, — and the recursive relationship for coefficient 

variance is given by 6? + 1 = wo? + (1 - w)F, , where 6? is the quantized value of the 
variance of the coefficient F ; , F, is the quantized value of Fj, and w is a weighting 
factor found to be 0.75 for best results. 
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0 

0 

0 
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0 
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0 
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Class 2 
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11111110 
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0 0 0 0 0 0 
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0 0 0 0 0 0 
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0 0 0 0 0 0 
0 0 0 0 0 0 
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0 0 0 0 0 0 
0 0 0 0 0 0 
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0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 
0 0 0 0 0 0 


Class 3 


Class 4 



Figure 10: Ordering of Coefficients in recursive block quantization 


The DC coefficient is quantized with the full 8 bits, and the initial condition for 
the recursive relationship is found by averaging the square of the first four AC 
coefficients to obtain a 2 . 
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The second technique [91][92][93] makes two assumptions based on the original 
image, f(x,y). First, this technique assumes the mean value of all AC coefficients is 
0. It also assumes the mean of the DC coefficient is the average brightness of the 
original image f(x , y ) which is denoted as m. From the transform definition, this implies 
E{F(h,v)} is 2m. The variances of the coefficients, F(w,v), are then given by 

< o = E{[F(0,0)] 2 }-4m 2 ; u=v=0, 

and 

< v = E{[F(«,v)] 2 }; (ii,v)*(0,0). 

Approximate Gaussian density functions for each coefficient are formulated from 
these parameters. Using the results of [94] we can set up Laplacian densities for the 
AC coefficients. The Laplacian density is given by 




-“w,vlPl 


where a** is a parameter which can be calculated from o^ v by 


o>.v = 



(«,v)*(0,0). 


and P represents an AC coefficient. 

The AC energy of a transform block is computed as 


E AC = "l xW,v)] 2 -[F(0,0)] 2 . 

k=0v=0 

The magnitude of the AC energy is used to classify the transform block into the high 
energy class or the low energy class. Several frequency regions, consisting of 16 x 16 
blocks, are of particular interest. These regions include low-frequency, mid-frequency, 
high-frequency, and horizontal and vertical edges. Figure 11 shows the locations of 
typical regions. 

Then, the ratio of low frequency AC energy to high frequency AC energy will 
subdivide each of these classes into a high frequency class and a low frequency class. 
High and low frequency coefficients are grouped. This system works better than 
classifying based on AC energy alone since human vision is more sensitive to high 
frequencies at low illuminations [95]. 
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Figure 11: AC energy classification by frequency 

The result of this procedure will be each of the four classes will contain different 
numbers of image blocks, which hinders the maintaining of a constant bit rate. The 
solution assigns bits to the classes with the constraint that the average bit rate is 
constant. 

The next step calculates the ensemble average of the variances of each F(m,v) 
coefficient in the whole frame for each class k by 


oj[(0,0) = 


T? I X [F„ 

L <n = 0 * = 0 


„(0,0)] 2 -4m 2 ; 


and 


, N 2 K NlS 1 1 

o 2 («,v)=— I I [F (m,v)] 2 ; (u,v)#(0,0); 

L m =0 h =0 

where k = 1, 2, . . ., K and the variables m , n index the various sub-blocks within the total 
image. The total image size is L x L and the sub-block size is N x N. 

Now we can consider bit allocations for each class. The bit allocation matrix for 
class k is given by the following equations 

N fct (u,v) = ^log 2 [o£(u,v)] -log 2 D; (w,v) * (0,0), 


and 
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1°&D = 


N 

2L(N- 1) 


N- I N - 1 


c t Z I 

k *0 v *0 



bpc Nb, vg 
+ N-l 


where c* is the number of N x N blocks assigned to class k, is the number of bits 
assigned to the DC coefficient (usually 8 bits) and b, vg is the desired average bits/pixel 

(bpp) obtained from the desired compression ratio, not including overhead. This may 
need to be done iteratively until the desired number of bits is used exactly. 

Now, the classes and bit allocations have been determined. The next task is to 
normalize the transform samples prior to quantization. For a given class and 
coefficient, the normalization coefficient is given by 

<V<«,v) = c ( 2 n ‘* < "-' 1 " 1 ) ; (u,v)* (0,0). 

where c is a normalization factor which is the maximum standard deviation value of 
all transform sample values which are assigned one bit. 

The overhead in bits/frame for this rather involved but effective technique is 
given by 


B=2 feJ +b ‘ +6N ’ 

where b c is the number of bits to encode c. For N = 8, L = 1024 and b c = 6, the 

overhead is about 8600 bits/frame or about 0.008 bit per pixel [96][97]. Clearly, 
overhead is minimal. 

Interestingly, the concept of threshold sampling of the coefficients described in 
a previous section is by nature an adaptive technique since more coefficients will be 
transmitted when the image block has finer detail [98]. This has the drawback of 
needing to code the positions of the coefficients, however. 

Another adaptive technique is to vary quantization parameters so the MSE is 
maintained at a constant value. This technique is complex to implement and has been 
shown to perform no better than class adaptive transform coding [99]. 

Coding The Coefficients 

Once the F'(w.v) array representing the quantized coefficients is known, further 
data compression can be achieved by coding the coefficients for transmission using the 
entropy coding techniques described earlier. 

Rim length coding is a logical technique since the coefficient array will contain 
many zero-valued coefficients. 

It has been suggested that chain (or contour) coding of transform coefficients 
could reduce data by 10-30% [100][101]. By chain coding the boundaries between the 
zero coefficient and non-zero coefficient regions, the non-zero coefficients are clustered 
together and can be more efficiently identified. 
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Chain codes are a set of directed line segments known 
as links. A candidate link is selected for the chain’s 
next link if it spans only zero-coefficients and if at least 
one neighboring coefficient, to the right of the link, is 
non-zero. An eight direction Freeman code is shown in 
the adjacent figure. This eight-direction code is only 
the first member of a large family of (8 xm)-direction 
codes, where m is an integer. This family is suitable 

for representing planar curves. 

Figure 12: 8-direction In the case of adaptive transform coding, the chain 

Freeman chain code coding algorithm outlmes the boundaries of 

non-zero coefficients in the transform domain. The non-zero coefficients and the chain 
links are then coded for transmission. The coded chain links provide a more efficient 
coding of zero coefficients than does run-length coding. The additional complexity of 
implementing this algorithm is modest. 

The Basis Vectors 

Finally, we need to consider different types of basis vector sets. These sets of 
functions need to be orthogonal and complete. Orthogonality implies a given basis 
vector cannot be represented as a linear combination of the other basis vectors. 
Completeness means any image is specified in terms of a transform coefficient matrix 
and is fully recovered (no distortion) via the inverse transform as long as quantization 
and compression have not yet been performed. The basis vector set is perhaps the 
most widely studied parameter of transform coding and will be treated in-depth here. 

Karhunen-Loeve Transform 

The Karhunen-Loeve transform (KLT) is also known as the Hotelling transform 
or the method of principal components. The transform algorithm selects the optimal 
set of orthogonal basis vectors so the elements of the coefficient array are uncorrelated 
[102], that is 

E{F(m„v 1 )F(m 2 ,v 2 )} = 0; if (w,v)*(0,0) 

with the transform coefficients assumed to be random variables. The basis vectors 
determined by the KLT are actually the eigenvectors of the covariance matrix. 

Since it produces completely uncorrelated coefficients, the KLT represents the 
optimum transform based on lowest distortion at a given bit rate and the least bits 
required to encode the coefficient array at a given distortion criterion. 

However, the KLT is not easily implemented since it requires by far the greatest 
number of calculations of any transform coding technique (it has no known "fast" 
algorithm). Also since the basis vectors are not known at the receiver, they must be 
encoded along with the coefficient array. This large overhead requirement defeats the 
optimum coding of the coefficient array. 
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Yet, the KLT is useful because it sets the limit on the performance achievable 
with any transform coding scheme. In practice, transform coding techniques use a 
suboptimal set of basis vectors which are already known at the receiver. As we shall 
see, the performance of the KLT can be closely approximated by some of these less 
optimal transforms at great computational savings. 

The KLT requires the formation of a set of vectors (this depends on the output 
of the sensing device). It also requir es the estimation of th e covariance matrix, and 
the calculation of its eigenvectors and eigenvalues. The time needed to calculate 
eigenvectors and eigenvalues is lengthy. It precludes real time, video rate operations. 
Not only is memory required when calculating the eigenvectors and eigenvalues; but 
additional memory is required for storing the elements of the covariance matrices. 


| NUMBER OF COVARIANCE MATRIX | 

1 ELEMENTS Vi 

3. IMAGE SIZE 

IMAGE SIZE 


NUMBER OF 



COVARIANCE 



MATRIX ELEMENTS 

M 

N 

M*N* 

4 

4 

256 

8 

4 

1024 

8 

8 

4096 

16 

8 

16384 

16 

16 

65536 

32 

16 

262144 

32 

32 

1048576 

64 

32 

4194304 

64 

64 

16777216 

128 

64 

67108864 

128 

128 

268435456 


This table reflects the enormous memory required for the covariance matrix as 
the block size increases. "[A] major drawback is that the required number of 
computation steps is also proportional to M 2 N 2 for an M x N image, which [this table] 
implies, is very large for many images" [103]. 

This lengthy, iterative nature of the KLT is known as principal components 
analysis. By using backward error propagation in a neural network implementation 
of this process, researchers hope to learn the elements of the optimal algorithm. 
Backward error propagation is a supervised learning scheme which changes the 
weights between non-linear units in a neural network. The non-linear units compute 
a sigmoidal function of their inputs. Learning occurs in each unit in the network by 
reducing the MSE of the output image. The learning algorithm produces a nearly 
linear transformation of the input, which are image pixels. The researchers termed 
the results "respectable . . . when compared to current techniques." [104]. 

Compression: 

Good quality reconstructed pictures result when using the KLT, while achieving 
compression as low as 0.5 to 1.0 bpp. Experiments demonstrate the MSE ranges from 
1.5% to 0.5% at these rates. 
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Spatial Domain 

The KLT is capable of maintaining crisp edges and texture in the reconstructed 
image. These two features, being of higher spatial frequency, are the most sensitive 
to the quantization scheme which is employed. Some blockiness is observed in 
non-adaptive methods. For the best results, an adaptive scheme must be used. This 
suggests a continued interest in neural network implementations of this algorithm. 
With a compression rate of 1.0 to 1.5 bpp, the reconstructed images exhibit a good 
definition of detail. 


Temporal Domain 

No reports of studies involving motion or interframe applications were found. 
The studies emphasize still images, due to the computational complexity of the 
algorithm. 

Aesthetic Appearance 

The KLT proves robust in minimizing transmission errors. In general, this holds 
true for transform techniques. Still, as the bits per pixel decreases, transform encoding 
will show block errors. These errors are caused, in part, by transmission errors. 
However, because of the block oriented processing, transform encoding does not result 
in the compounded errors of other techniques. These compounded errors can cause 
streaking, lost frames, and jerky motion. 

Quantizing errors are distributed throughout the reconstructed image. This 
benefit results in visually less objectionable images. Errors are often undetectable to 
an untrained observer. To show the errors, researchers often employ an image which 
maps %MSE x 4. In such images, one sees that errors are not concentrated to one 
specific feature, such as edges or low spatial frequency areas (contours). To the casual, 
or inexperienced, observer, the greatest errors occur in areas of very high detail or 
texture. 

The KLT and DCT perform the best at preserving edges. Again, edge fidelity 
improves with adaptive quantization. 

Spectral Information 

The distortions produced by encoding color images were generally similar to those 
produced by using monochrome images. 


Discrete Cosine Transform 

The two dimensional Discrete Cosine Transform (DCT) of an N x N array is given 


by 

2 N-iN-i 

F Dcr («,v) = -C(u)C(v) 1 I 

IN x = 0 y =0 


f(x,y)cosj^ — j cosj^ 


(2 y + 1)tcv I 
2N 


where C(0) = ^ and C(u) = 1 if u > 1 . 

The inverse two dimensional DCT is given by 
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For N x N x M sub-blocks, the three-dimensional DCT is given by 
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The DCT is an attractive choice for a basis vector set since the performance of 
the DCT closely approximates that of the KLT [105], In fact, for an image which can 
be modeled by a Markov source with correlation coefficient close to unity, the KLT 
reduces to the DCT [106]. For most classes of images, this is a valid assumption. 

Yet, the DCT has enormous advantages over the KLT in terms of simplicity. 
Since the basis vectors of the DCT are known in advance, they need not be calculated 
for every transform block as they do with the KLT. Also, the basis vectors need not 
be transmitted along with the coefficients, and there are several "fast" algorithms for 
computing the DCT which make real time applications feasible [107]. These savings 
in overhead actually make the DCT more efficient at a fixed bit rate than the KLT. 
This justifies removing the KLT from further consideration for HHVT. In fact, the 
DCT has been shown to have the highest coding efficiency (i.e., highest compression 
ratio with least distortion) of any transform coding scheme. However, in spite of the 
vast reduction in complexity over the KLT, the DCT is still moderate to high in 
complexity of implementation compared to other algorithms. 

Computational simplicity suggests that the Hadamard transform is the best 
dioice for an encoding implementation. Yet, as will Be demonstrated, it suffers a 
significant performance degradation, especially when compared to the DCT. Specific 
VLSI hardware provides efficient, fast implementations of the DCT. Hence, it appears 
best for applications requiring the highest degree of compression while minimizing 
coding distortion. 

Compression 

Use of the Discrete Cosine Transform results in good quality reconstructed 
pictures, while achieving compression as low as 0.5 to 1.0 bpp. Experiments 
demonstrate the MSE ranges from 1.50% to 0.50% at these rates. Adaptive 
quantization yields even lower MSE. Experimental results range from approximately 
0.75% to 0.20%. The higher bit rate, i. e., less compression, yields the lower MSE. 

Spatial Domain 

The DCT maintains crisp edges and texture in the reconstructed image. These 
two features, being of higher spatial frequency, are the most sensitive to the 
quantization scheme which is employed. To minimize blockiness, an adaptive scheme 
is preferred. With a compression rate of 1.0 to 1.5 bpp, the reconstructed images exhibit 
a good definition of detail. 
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Temporal Domain 

Several reports of studies involving motion or interframe applications indicate 
gains in both compression and %MSE. This is a result of greater correlation between 
pixels in successive frames. Unfortunately, the increase in computation time, and 
circuit complexity and size, has limited the practice of interframe techniques. However, 
the use of a charge coupled device (CCD) or charge injection device (CID) sensors would 
greatly facilitate these algorithms. 

A three dimensional DCT yields roughly a 30% improvement over the 
performance of a two dimensional DCT. When a motion compensated (MC) algorithm 
performs interframe compression, one achieves a 100% improvement over the 2D-DCT. 
These improvements are most noticeable in areas of no motion. As the motion 
increases, the advantages of the interframe encoding decrease; but this suggests a 
preservation of motion. 

Aesthetic Appearance 

The DCT proves robust in minimizing transmission errors. Schemes using very 
low levels of encoding, from 0.5 to 1.0 bpp, result in a great degree of compression. 
However, at these low levels, the reconstructed image becomes more sensitive to 
transmission errors. One begins to notice block errors. 

The reconstructed image has very evenly distributed quantizing errors. This 
benefit results in visually less objectionable images. As with the Karhunen-Loeve 
transform, errors are often undetectable to an untrained observer. The most noticeable 
errors occur in areas of very high detail or texture. 

The DCT performs the best at preserving edges. Again, edge fidelity improves 
with adaptive quantization. 

A hybrid MC/2D-DCT does not portray blurred motion seen in some 
non-transform based techniques. While yielding a spatially, good quality reconstructed 
image, it does have some slight flicker in intensity around the boundaries of the motion. 
This flicker does not inhibit an observer’s viewing or evaluation of the content of the 
image. A discussion of this method in the section on hybrid techniques will show some 
means to eliminate this. 

Spectral Information 

RGB distributed errors produce a slight reduction in color purity. For this reason, 
RGB values are converted to less correlated YIQ values. Discrete cosine 
transformations of the YIQ yield very accurate color in reconstructed images. Between 
adjacent pixels, the DCT maintains slight variations in hue and saturation in the 
reconstructed image. Very little contouring or banding appears. However, if the 
encoding is horizontally oriented, vertically errors appear as blocks in background 
areas. Often, the quantization procedure will place most of the errors in the Q- (and 
to a lesser degree, the I-) components. This minimizes the distortions in the 
reconstructed color images. 
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Slant Transform 

The Slant transform was developed to take advantage of linear changes in 
brightness which occur in some classes of images. The transform is an orthogonal set 
of sawtooth waveforms. 

These waveforms are generated via a recursive matrix procedure. The Slant 
matrix of order two is given by 



and the matrix of order four is given by 

wherc 9= [o o] 

where a 4 and b 4 are determined by requiring a linear (constant negative slope) function 
formed in row two, which yields that a 4 = 2b 4 . This and the orthonormality condition 
SS T =I lead to the following for S 4 : 


S 4 


A second iteration of the process given in [108] results in the eighth order Slant 
matrix, S 8 , as shown in Enomoto and Shibata [109]. 


'1 1 1 1 ' 
J3_ _1_ -4 -3 

1 V5 V5 V5 V5 
21 - 1-1 1 
J_ -3 _3_ -1 

_V5 V5 V5 V5_ 
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The Slant transform does not perform as well as the DCT, but is significantly 
easier to implement. The irrational constants the matrices contain may be stored in 
a look up table in a hardware implementation [110]. Pratt, et al, develop a reordering 
of these matrices to yield a signal flow graph from input image values to Slant transform 
values. This signal flow graph leads directly to a systolic array based, fast, hardware 
implementation. 

Compression 

With a compression of 1.0 to 1.5 bpp, the Slant transform produces fine 
reconstructed images. Use of zonal, adaptive coding of the coefficients, help maintain 
a %MSE of 1.0% or below. Adaptive techniques provide additional bandwidth reduction 
of 50% for the same degree of image quality. 

The optimum block size for this transform appears to be 8 x 8 or 16 x 16 pixels. 
As block sizes within the image increase, the performance of the transformation 
approaches that of the Haar and Hadamard transforms. The Slant transform results 
in a lower MSE for moderate size image blocks when compared to the Haar and 
Hadamard transforms [111]. 

Spatial Domain 

The Slant transform reproduces linear variations of brightness quite well. 
However, its performance at edges is not as optimal as the KLT or DCT. Because of 
the "slant" nature of the lower order coefficients, its effect is to smear edges. At 1.5 
bpp, the Slant transformed image is much more desirable than the Hadamard or Haar 
transforms. 

Temporal Domain 

No information was reviewed. 
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Aesthetic Appearance 

High quality are produced images at a compression ratio of approximately one 
half. But, as the rate of compression increases, the quality of the images drops off. 
Compared to the Hadamard and Haar transforms, the Slant transform showed almost 
no block effects. 

The Slant technique is relatively tolerant to transmission errors. Such errors 
tend to extend only to the end of one block. This depends, partly, on the coding method 
used. 


Spectral Information 

Colors, when expressed in YIQ terms, seem quite accurate in reconstructed 
images. At 2.0 bpp, block effects appear most noticeable in areas of low spatial 
frequency, i. e., backgrounds. Edges in the reconstructed image are slightly less crisp 
than in the original image. Still, they are quite clearly defined and discemable. Slight 
variations in hue and saturation within adjacent areas are still preserved. Texture 
in background elements, e. g., carpet and upholstery, is preserved very well in the 
reconstructed image. At the same compression, reconstructed color images appear to 
perform better than monochrome reconstructed images in maintaining texture and 
fine detail. This suggests using color images when one wants to preserve texture. 

Hadamard Transform 

The basis vectors of the Hadamard transform comprise a set of orthogonal 
rectangular waveforms which only assume values of 1 or -1 and are defined over a 
given spatial interval. These waveforms are known as Walsh functions and the 
transform has also been called the Walsh transform or the Walsh-Hadamard transform 
[ 112 ]. 

There are several ways of deriving these rectangular waveforms. The Hadamard 
basis vectors in natural order can be found using Hadamard matrices. A Hadamard 
matrix is square and has elements of 1 and -1 only and the rows and columns of the 
matrix are orthogonal. The lowest order Hadamard matrix is of order two, 



For N x N matrices where N is a power of 2, H N can be recursively derived as 





Since a Hadamard matrix equals its transpose, then the Hadamard transform 
in two dimensions [113] is given by 


F h (« , v ) = [H n ] [f (x , y )] [H n ] 
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find since the inverse of a Hadamard matrix is itself multiplied by a scalar -1/N, then 
the inverse transform is given by 


«*.y) = ^[H N ][F»(«,v)]P N ] 

The natural ordered set of basis vectors obtained via Hadamard matrices is not 
ordered by increasing sequency. Thus, the coefficients must be reordered before 
techniques such as zonal sampling are applied. 

The sequency ordered Hadamard transform may be obtained by Boolean 
synthesis. Here the transform is given by 


n-1 

N-1N-1 I [g>)^ + 6*( v )jJ 

F h (« > v)=Z I f(x,y)(-ir 

x»0 y =0 


where g 0 (u) = u n _ l ; gi(u) = u a „ i +u D _ 2 ;, g 2 (u) = u a _ 2 +u n _ 3 ; ... 

and Mdecfanal = (Mn- l^n-2^n-3 • • • ^l^o)binary 

and similarly for v . 

The additions within the summation in the exponent are performed modulo-2, that is, 
binary additions with no carry. 

One of the most advantageous properties of the Hadamard transform is how the 
basis vectors assume values of 1 or -1 only. Thus, computing the transform involves 
only combinations of addition and subtraction of the samples in the image array. The 
"fast" algorithm for the Hadamard transform involves the least calculations of any 
transform algorit hm (only Nlog 2 N additions/subtractions and no long multiplications 
since all multiplications involves powers of two and can be performed by shift 
operations) [114]. 

Another useful property of the Hadamard transform is that the coefficients in 
thfi F(u , v ) array will either be all even or all odd [115]. This permits further coding 
efficiency by assigning a bit for even or odd to the whole array and then truncating 
the least significant bit from each coefficient with no loss of information. 

The Hadamard transform is the transform technique most easily implemented 
in VLSI for real time applications [116][117][118][119]. Rather than zonal or threshold 
sampling, ranking of coefficients yields the best results [120]. This ra nk i n g is 
determined by spatial and temporal precedence guidelines [121]. A given coefficient 
should not be transmitted unless those coefficients with higher precedence are also 
transmitted. The result of not following this procedure is undesired edges or blurring. 
Logarithmic quantization of coefficients is also desirable [122]. 
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Compression 

Despite its computational simplicity, the coding efficiency of the Hadamard 
transform is significantly below that of the DCT or Slant transforms. Thus, its 
compression is slightly less than the compression achieved with the previous transform 
kernels (or basis vectors). As the block size increases above 8x8, experiments show 
a marked increase in the MSE values between this transform and the KLT, DCT, and 
Slant transforms. 

Compression down to the range of 1.0 to 1.5 bpp produce good quality 
reconstructed images. The %MSE between the original and reconstructed images falls 
between 1.5% and 1.0%. 

Spatial Domain 

At equal compression ratios, more distortion appears in a Hadamard transformed 
image than a cosine transformed image. This applies both to mean square error criteria 
and subjective evaluation [123]. Mean square error images (%MSE) show how 
quantization error tends to gather in areas of high spatial frequency such as edges 
and texture. This indicates poor coding performance. Uncorrelated pixels in MSE 
images indicate good performance. Quantization errors, which are more localized than 
previously discussed transforms, do not distort picture elements. Yet, compared to 
the others, the reconstructed image appears out of focus. 

Temporal Domain 

By taking advantage of interframe pixel correlation, the Hadamard transform 
yields a 30% to 100% improvement in compression (expressed as bits per pixel) and 
%MSE. 

Aesthetic Appearance 

Some block errors appear as the pixel rate approaches 1.0 bpp. Undesired edges 
or blurring may result from improper quantization of the coefficients. 

The results of Hadamard transform encoding produce visually less objectionable 
pictures than those resulting from lossy, predictive schemes. The pictures contain fair 
edge reproduction. However, subjective comparisons of images employing other 
transforms indicate the Hadamard provides the least desirable images. Experiments, 
where quantization and bit allocation were the same for all transforms, supports this 
conclusion. The only part of the implementation which varied was the particular 
trans form algorith m. 

Spectral Information 

The distortions produced by encoding color images were generally similar to those 
produced by using monochrome images. 
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Haar Transform 

The basis vectors for the Haar transform are also rectangular waveforms, but 
they may assume values of A, 0 and -A, where A is a constant which depends on the 
"order" of the basis vector. The first Haar basis vector, HARo(x) in one dimension is a 
DC value defined over the interval (from 0 to N-l) which the transform is taken. The 
rest of the set of Haar functions over the same interval can be found by 


HAR y+( ,(x) = 


C7 „ n ^ [ n 

12 P for — <,x< — 

0 P OP 

[ n+ i] 

for _. 2 <,x <, [n + 1]2 P 


0 


2 P 

for all other* 


and the two dimensional Haar transform is found by 


F„(u,v) = 1 N £ I , f(Jt,y)HAR„C*)HAR v (y). 

N x=0y=0 

The Haar transform represents a measure of locally concentrated differential 
energy within a sub-picture. The Haar transform is not a "sequency" type transform; 
thus, zonal sampling is generally not used. 

The Haar functions’ rectangular shapes make them fairly easy to implement in 
VLSI circuits, but the scaling factors involving fractional powers of two make the Haar 
transform more difficult to implement than the Hadamard transform. At the same 
compression ratio, using the Haar transform over the Hadamard transform saves a 
small degree of mean-square error. 

Compression 

Interframe adaptive encoding results in a compression of 0.70 to 1.70 bpp with 
a %MSE ranging from 0.8% to 0.2%. 

Spatial Domain 

At maximum compressions, the Haar transform produces accurate 
reconstructions of monochrome and color images. When the YIQ component coding is 
utilized, color images retain much texture and detail. The greatest drawback of this 
transform scheme is its effect on edges. Edges in the reconstructed image show a 
decided blockiness. 

Temporal Domain 

Interframe compression yields approximately a 20% improvement in compression 
when compared to two dimensional DCT, KLT, or Slant transforms. 
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Aesthetic Appearance 

Elements in the image are still discemable; however, the loss of resolution in 
textured areas proves annoying to many observers. The effect produces an out of focus 
appearance. Block errors along edges are definitely noticeable at maximum 
compression rates. 

Spectral Information 

YIQ component images perform as well as, or even better than, monochrome 
images. The use of color may help to preserve texture and fine detail. 
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Hybrid Techniques 

Discrete Cosine Transform/Vector Quantization 

Discrete Cosine Transform/Vector Quantization (DCT/VQ) [124][125][126] 
involves using VQ on the DCT coefficients. It provides an opportunity to improve upon 
the compression/quality achievable by either of the techniques individually. According 
to rate-distortion theory, for a given amount of distortion, a lower rate can be obtained 
by using vector, rather than scalar coding. Therefore, using VQ on the DCT coefficients 
will provide better performance than any quantization scheme which treats them as 
a set of scalars. Also, the statistics of the DCT coefficients can be modeled more easily 
and with greater accuracy than the statistics of the image itself. This allows for more 
efficiency in the design of the vector codebooks and less computation time in the 
implementation of VQ. In some cases the higher sequency coefficients are dropped. 
This reduces the dimension of the codevectors which significantly reduces the number 
of computations needed to use VQ. 

Compression 

Images of 480 x 768 pixels digitized at 8 bpp have been encoded at 0.7 - 0.8 bpp 
with absolute average error of 5 - 10 levels. This is not high quality. Another method 
used 1.1 bpp to reproduce an image with no visible distortion. The SNR of DCT/VQ 
was found to be 1 - 2 dB higher than DCT with scalar quantization for the same bit 
rate. 


Spatial Domain 

The types of errors that occur are not easy to predict. Degradations due to VQ 
would cause errors similar to those produced by using DCT at a low bit rate. The 
degradations due to throwing away the high sequency coefficients are discussed in the 
transform section. 

Temporal Domain 

Not Applicable. 

Aesthetic Appearance 

Low bit rates (below 0.5 bpp) produce some blockiness. 

Spectral Information 

Not reported. 
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Discrete Cosine Transform/Motion Compensation 

Television information contains temporal redundancy as well as spatial 
redundancy. That is, a pixel of one frame is the same as the pixel in the same position 
in the f ollow ing frame and thereforejieed not be transm itte d. The pixel description 
of the earlier frame could be used. The compression techniques addressed thus far, 
in this report, are directed toward reducing spatial redundancy and lossy compression 
only in the spatial domain of a single frame. Conditional Replenishment is used, in 
part, to remove temporal redundancy where corresponding pixels of one frame are the 
same as a prior frame. In Conditional Replenishment the pixels are separated into 
two groups, those pixels, called background pixels, which are the same as their 
corresponding pixel in the previous frame; and those pixels classed as moving area 
pixels, which are not the same as their corresponding pixel. Only the moving area, 
changed pixels, and their locations are transmitted. This method can be improved by 
sending an estimate of the displacements of groups of pixels, such as those that might 
be representative of a moving object, to provide MC. 

In hybrid transforms, interframe DPCM has been combined with MC, such as 
DCT/DPCM. The approach used in DCT with interframe DPCM is shown in the 
following figure. 



Figure 13: Hybrid interfirame Transform/DPCM encoder. 

The steps used in the Transform/DPCM interframe coding are 

1. Partition tiie frame into blocks. 

2. Take the two dimensional, spatial Discrete Cosine Transform of each 

block to yield, for each block, a block of transform coefficients. 

3. Predict the coefficients of the k th block of the present frame from the 

corresponding coefficients of the k th block of the prior frame. 

4. If the difference is less than a selected threshold, send a zero block, if it 

is greater send the prediction error. 
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Compression is achieved by redundancy removal and truncation of the coefficient 
set using the DCT in the spatial domain. And by redundancy removal and lossy 
compression using DPCM in the temporal domain. The use of MC can be added to the 
above process in the following manner: 

5. If the difference is less than a selected threshold, send a zero block. 

6. If it is greater, search adjacent pixels or coefficients to define spatial 

displacement between frames. A recursive algorithm using pixel 
intensities or transform coefficients can be used, or a block matching 
algorithm can be used to define a displacement vector. 

7. Send the prediction error along with the displacement vector. 

Compression 

The applications of DCT/DPCM with MC, described in the recent literature, are 
for video conferencing. The resolution of the video test image sequences are not 
representative of NTSC video images. The motions are limited and more constrained, 
representing video conferencing material content, showing individual or groups of 
people engaged in normal conference or meeting activities. In comparison of 
Transform/Conditional Replenishment without MC, the use of MC provides a 30% to 
40% improvement in bit rate [127]. Achieving compressions, for "good" quality pictures 
judged subjectively, of 0.1 to 0.4 bpp using adaptive DCT [128], and separate coding 
of pulse-like components which cause slope overload in DPCM [129]. 

Special Domain , 

As compression become large, 0.1 bpp, block noise and granular noise become 

more visible. Common distortions of transform coding. 

Temporal Domain 

The use of MC with DCT/DPCM produces fluctuations in luminance m the region 
of moving edges. This effect has the appearance of mosquitoes, termed the mosquito 
effect, and although not large in amplitude, can be very annoying. Moving areas, or 
objects, cover stationary background in the direction of motion, and uncover 
background away from the direction of motion. This produces a step change between 
successive frames, resulting in pulse-like peaks using DPCM. The separate coding of 
these peaks using Scalar Quantization (SQ) has been used to significantly reduce this 
distortion [130]. 

Aesthetic Appearance 

Block matching at the edge of the transform blocks, granular noise due to coarse 
quantization, and the mosquito effect are concentrated in localized regions and 
therefore are more visible than uniformly distributed errors. MSE are therefore not 
representative of the subjective effects of the errors. 

Spectral Information 

Edge blockiness and the mosquito effect result in discontinuities and noise 
fluctuations in chrominance as well as luminance, resulting in spectral errors in the 
region of block edges and spectral noise in the region of moving edges. 
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HIGH SPEED IMPLEMENTATIONS OF VDC ALGORITHMS 

The literature on video data compression techniques consists mostly of papers 
detailing the development or improvement of specific techniques. Usually, the paper 
includes experimental results comparing the compression ratio and image degradation 
of the new technique with previously existing ones. The experiments are generally 
performed by implementing the compression techniques in software and applying them 
to standard test images. The speed at which the compression can be performed is not 
addressed as a major concern, and it is seldom discussed at all. Many papers, 
addressing the practical implementations of VDC techniques for specific applications, 
jump immediately to hardware implementations. Some of the implementations involve 
designs of chips for specific tasks. Others use standard digital signal processing (DSP) 
chips. Using today’s technology, VDC software written for a general purpose 
microprocessor are not fast enough for real-time applications. Therefore, software 
applications will not be considered any further. 

Some representative examples of hardware implementations of high speed video 
data compression techniques are presented in the Table of Hardware Implementations 
at the end of this section. Most of these are implementations of predictive or transform 
techniques. The applications for which the techniques have been implemented 
generally involve television signals at video rates (30 frames/sec). In terms of pixel 
rate, "video rate" covers a large range. The actual pixel rate depends on the method 
of digitization as well as the frame rate. The highest standard throughput for digitized 
video signals is 14.32 Mp/s for an NTSC signal sampled at 4 fsc. Other standards 
produce rates in the range of 7.8 - 13.5 Mp/s. Some of the implementations can be 
used at higher rates than were reported, but the developers did not try to do so because 
the practical applications did not exist. 

One-dimensional DPCM implementations have produced output of 3 - 5 bppwith 

input rates of 10 - 14.3 Mp/s. Two-dimensional DPCM can compress an image down 
to 3 - 4 bpp at input rates as high as 10.7 Mp/s. Interframe DPCM has achieved 
compression to 1.6 bpp at 10.6 Mp/s. 

Separable N x N transforms can be implemented using a technique known as 
row/column decomposition. This technique involves performing N lxN transforms, 
a ma trix transposition (accomplished by careful addressing of RAM), and another 
N 1 x N transforms. Most of the popular transforms are separable and also have "fast" 
implementations which reduce the number of arithmetic operations. . These two 
characteristics make simple, high speed hardware implementations feasible. 

Hadamard tr an sform techniques have been implemented to produce 2 bpp at 8. 1 
Mp/s and 0.5 bpp at 1.8 Mp/s. Also, a system using a pipeline fast Hadamard transform 
configuration has been implemented at 128 Mp/s [131]. 

DCT techniques can achieve 1.6 bpp at 10.4 Mp/s and 0.82 bpp at 9.7 Mp/s. There 
have also been DCT chips produced which perform the two-dimensional transform at 
14.3 Mp/s and above. A chip architecture to implement DCT at data rates up to 27 Mp/s 
(the combined video rate of component coded television, CCIR Rec. 601) has been 
proposed. In neither of these cases was a quantizer for the transform coefficients 
included, so neither is a complete compression implementation. 
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A proposed technique performs discrete unitary transforms (DCT, DFT, etc.) 
optically with incoherent light by using acousto-optic spatial modulators and a CCD 
camera for detection and storage [132]. This technique can perform two-dimensional 
transforms of entire images at "video rate" since the speed is limited by the CCD 
camera. Performing many smaller transforms would require many modulators and 
light sources in parallel. Therefore, this implementation would be most practical for 
performing transforms on large blocks or entire images. With the appropriate 
bit-allocation schemes, using larger blocks can produce better compression according 
to information theory. This technique is not a focal plane implementation since 
row/column decomposition is used. The second one-dimensional transform uses the 
output of the first one-dimensional transform which is fed back from the camera to a 
spatial modulator. 

Optical transforms can be performed using the electro-optical spatial modulators 
and CCD camera if the transform is separable. Current parallel scan architecture of 
the HHVT system does limit the usefulness of this approach if large blocks are to be 
considered. 

An edge detection technique that produces a 1 bpp edge map of the image was 
implemented at 10 Mp/s. Other types of filters have been developed that run at clock 
rates of 30 MHz and above and process images at rates of 5 - 10 Mp/s. 

A parallel processing implementation of BTC has been proposed. It is based on 
early 1980’s technology and would compress about 0.5 Mp/s per processing element 
(PE). The output would be 1.625 bpp. With today’s technology it is likely that such a 
technique could be implemented at a higher data rate, thereby allowing a high-speed 
implementation without an unreasonable number of PEs. 

A hardware implementation of a vector quantization technique is planned. It 
would operate at 1 1.8 Mp/s. No indication was given of the planned compression ratio. 

Recently, experimental work has been done on the compression of HDTV signals 
which use about five times the bandwidth of conventional television signals. Various 
compression techniques have been developed. None of these techniques attempt digital 
compression at the ftdl Nyquist rate. They use sub-sampling or analog filtering to 
reduce the pixel rate before applying digital compression algorithms. A previous pixel 
DPCM coder has been developed that operates at 16 MHz and produces 5 bpp. Another 
DPCM technique has been demonstrated at 16.2 Mp/s [133][134]. 
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TABLE OF HARDWARE IMPLEMENTATIONS OF VDC ALGO 




75 















131. Noble, S.C., "A Comparison of Hardware Implementations of the Hadamard Transform...," 

Proceedings of the SPIE, vol. 66, pp. 207-11, 1975. 

132. Lebreton, G., "Image Data Compression Using Unitary Transforms: Video-Rate Optical...," 

Proceedings of the SPIE, vol. 492, pp. 278-283, 1984. 

133. Gaggioni, H. and Le Gall, D., "Digital Video Transmission and Coding for the Broadband 

ISDN," IEEE Transactions on Consumer Electronics, vol. CE-34, pp. 16-34, Februaiy 1988. 

134. Hopkins, R., "Advanced Television Systems," IEEE Transactions on Consumer Electronics, 

vol. CE-34, pp. 1-15, February 1988. 


76 


USER REQUIREMENTS CASE STUDIES 
Background 

Baseline Near-term HHVT Video System 

Some of the significant specifications of this system are 

1. Monochrome only. 

2. 1024 x 1024 pixels maximum resolution of the sensor. 

3. Sub-framing capability with pixels addressable in 8 x 8 blocks. 

4. 8-pixel parallel scan format. < 

5. 80 Mp/s (8 x 10 7 pixels/second) maximum sensor pixel addressing rate, i.e. 

(pixels/frameXframes/ sec) ^ 8 x 10 7 . 

Therefore, the maximum frame rate at full (1024 x 1024) resolution is 
(8 x 10 7 / 220) = 76.29 fr/s. The resolution of each frame can be traded off for 
a higher frame rate. 

6. Each pixel can be coded in 1, 2, 4, or 8 bits/pixel. . 

The maximum data output rate from the sensor is 640 Mbps. 
(1 Mbps = 10 6 bits per second) 

7. 512 Mb (229 bytes) dynamic RAM with data transfer rate of up to 1160 Mbps. 
8* 99 GB magnetic tape recorder with data transfer rate of up to 240 Mbps 

(MIL-STD-2 179). 


Communications Link Capabilities 

The following downlink data rates are derived from "An Investigation of Available 
Communications Link Capabilities for Space Experiments Employing HHVT" which 
was submitted by Analex Corporation on February 1, 1988. This report contains more 
detailed explanations of the scenarios considered. 

1. Short Transmissions . 

If the data is transmitted over a period of time comprising a small fraction (less 
than 1/10) of an orbit, it should be scheduled for a section of the orbit for which there 
is TDRSS coverage. In this situation the best and worst case downlink rates for the 
various vehicles are 


A. Space Shuttle 


- 

1. Best case - no time sharing 

48 Mbps 


2. Worst case - 5-way time sharing 

9.6 Mbps 


B. Spacelab / Space Shuttle 



1. Best case - no time sharing or 

48 Mbps 

- 

multiplexing 



2. Worst case - 2-way time sharing and 

1.5 Mbps 


16:1 multiplexing 




C. Space Station Freedom 


- 

1. Best case - time sharing where HHVT 

50 Mbps 


has the necessary block of time 


- 

reserved 


- 

2. Worst case - 6:1 multiplexing 

7.0 Mbps 


D. USLab / Space Station Freedom 


— 

1. Best case - time sharing where HHVT 
has the necessary block of time 
reserved 

75 Mbps 

- 

2. Worst case - 16:1 multiplexing 

4.0 Mbps 

- 
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2. Long Transmissions 

If the data is to be transmitted over a period of time comprising a significant 
fraction of an orbit or more, the coverage of the carrier by TORSS, as well as antenna 
blockage, must be included in the link availability calculations. In this situation the 
best and worst cases are: 

A. Space Shuttle 

1. Best case - 90% coverage 

2. Worst case - 5-way time sharing, 

52% coverage, rendezvous activity 

B. Spacelab / Space Shuttle 

1. Best case - 90% coverage 

2. Worst case - 2-way time sharing and 

16:1 multiplexing, 52% coverage, 
rendezvous activity 

C. Space Station Freedom 

1. Best case - 6-way time sharing, 85% 

coverage 

2. Worst case - 6:1 multiplexing, 85% 

coverage 

D. USLab / Space Station Freedom 

1. Best case - 10-way time sharing, 85% 

coverage 

2. Worst case - 16:1 multiplexing, 85% 

coverage 

Experiment #102 - Solid Surface Combustion 

General Description of Experiment and Image Content 
A piece of ashless filter paper is ignited via a hot wire in an Oa/N 2 environment 
and bums. The propagating flame is recorded as it moves across the paper. 
Information to be Derived from the Video Record 
Edge of flame, color. 

Video System Requirements 

1. 2 views, color, 256 intensity levels 

2. Resolution: measure dimensions of 0.02 cm to within 10% in a field of view 

of 10 cm x 5 cm — » 5000 x 2500 pixels 

3. Frame rate: 64 fr/s for 3 min 

4. Runs per flight: 3 

Required Data Acquisition and Storage 
( 64 frames/second )( 180 seconds ) = 11,520 frames 
( 11,520 frames )( 5000 x 2500 pixels/frames ) = 144 Gigapixel 
( 144 Gigapixel )( 3 Byte/pixel ) = 4.32 x 10 u B required storage/view/run 
( 4.32 x 10 u B )(3 runs)(2 views) = 2.59 x 10 12 B required storage/flight 
Baseline Data Acquisition and Storage 

The baseline sensor will be monochrome only, with a maximum resolution of 1024 
x 1024 pixels. 


43.2 Mbps 

2.4 Mbps 

43.2 Mbps 
0.375 Mbps 

7.1 Mbps 
6.0 Mbps 

6.4 Mbps 

3.4 Mbps 
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( 64 frames/second )( 180 seconds ) = 11,520 frames 
( 11,520 frames )( 1024 x 1024 pixels/frame ) = 12,080 Megapixels 
( 12,080 Megapixels )( 1 Byte/pixel ) = 1.208 x 10 10 B storage/view/run 

This is much more data than can be stored in the dynamic RAM, so magnetic 
tape storage is needed. The use of the magnetic tape recorder imposes a maximum 
data storage rate of 240 Mbps. The possible approaches to compressing the data for 
storage will be discussed later. In the meantime the data storage can be calculated 
as follows. 

( 240 Mbps )(180 s) = 43.2 Gb = 5.40 x 10 9 B storage/view/run 
( 5.40 x 10 9 B )(3 runs) = 1.62 x 10 10 B storage /view /flight 

This total is less than the capacity of one reel of magnetic tape. 

Required Downlink Rede 

The data from each run must be transmitted within 12 hours. 

( 4.32 x 10 11 B/view )( 2 views ) = 8.64 x 10" B = 6912 Gb 
( 6912 Gb)/ [(12 hrX3600 sector)] = 160 Mbps 

Baseline Downlink Rate 

The data from each run must be transmitted within 12 hours. 

( 43.2 Gb/view X 2 views ) = 86,400 Mb 
( 86,400 Mb ) / [(12 hrX3600 sector)] = 2.0 Mbps 

Data Compression Requirements 

The baseline system without data compression falls short of meeting the 
acquisition and storage requirements of this experiment in a number of areas. 
Therefore, video data compression must be considered. Since the baseline system does 
not provide color information, the remaining important features to be derived from 
the image are the intensity of the flame and its edges. Therefore, any compression 
technique to be considered should preserve, as much as possible, edge information and 
edge location. 

The first area in which the baseline system is inadequate is resolution. The limit 
of 1024 x 1024 pixels is below the required resolution by a factor of 5 in one dimension 
and 2.5 in the other. An optical system containing Fourier transform optics 
implemented in front of the sensor might be able to use the 1024 x 1024 available 
pixels to record high spatial frequency information at the expense of low frequency 
information in order to allow the detection of objects beyond the original resolution of 
the sensor. However, this type of system does not seem to be very practical for this 
experiment since low frequency information is also necessary. An alternate approach 
is to reduce the field-of-view to 2 cm x 1 cm combined with electro-mechanical tracking 
of the image. This would permit direct use of the RAM. 

The second issue involves the storage of the video data. As mentioned above, the 
amount of data which needs to be stored predudes the use of dynamic RAM for storage. 
Even using the RAM as a buffer for magnetic tape storage will not suffice. Therefore, 
the data cannot be stored at a rate exceeding the maximum data transfer rate of the 
magnetic tape unit, 240 Mbps. The sensor, however, is producing data at the rate of 

( 1,048,576 pixels/fr )( 64 fr/s ) = 67.1 Mp/s. 
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In order to take full advantage of the data acquisition rate of the sensor, all of 
the data produced by the sensor must be compressed to 240 Mbps in real time. In 
other words, the image must be compressed from 8 bpp to (240 / 67.1) = 3.58 bpp at 
an input rate of 67.1 Mp/s. The entropy of the experiment images is expected to exceed 
3.58 bpp during the time of experiment execution. It may be possible to use a lossless 
technique. However, without specific knowledge of the image statistics, it is not 
possible to determine if the RAM (with a storage capacity of 1/2% of the total data 
storage) can be used as a buffer for a lossless technique. Therefore, an analysis of the 
choice of a lossy technique follows. 

The specifications of the compression technique are high speed, relatively low 
compression, implemented between the sensor and the tape recorder. These 
requirements point to using a straightforward technique that can be implemented in 
the parallel pipeline. As discussed above, two-dimensional predictive and transform 
techniques are appropriate. 

Predictive techniques tend not to accurately preserve the location of edges 
because of slope overload. Also, the output bit rate of transform techniques can be 
adjusted more easily than that of predictive techniques. Therefore, a transform 
technique is the most appropriate for this application. It can provide the necessary 
compression in real time while preserving the location of the edges fairly well. 

An alternate approach would be to use a sub-frame of 800 x 800 pixels with a 
reduced field-of-view to maintain resolution. This would require electronic or 
electro-mechanical tracking of the burning region, either of which would be technically 
feasible. 

The third area needing compression is downlink transmission. The downlink 
fimp will cover many orbits, so the 'long transmission" bit rates should be used in this 
analysis. Although the baseline system requires a downlink rate of only 2.0 Mbps, 
running this experiment on a Spacelab mission (which is where it is presently proposed 
to run) might require significantly lower data rates, depending on the requirements 
of the other experiments. In the worst case the data would have to be compressed by 
an additional factor of 5.33 to 0.375 Mbps (0.67 bpp). This compression can be done 
at a comparatively slow downlink rate (1 Mp/s) which will allow more complex 
techniques to be considered. 

A si gnifi cant amount of compression could likely be obtained from interframe 
prediction, with or without motion compensation. 

Based on a comparison of the above techniques, it is recommended that the 
two-dimensional fast Hadamard transform be considered to achieve the compression 
from8 bppto3.58 bpp. To achieve the additional compression of5.33itis recommended 
that interframe prediction in the transform domain without motion compensation be 
used, followed by run-length entropy encoding. 


81 



Experiment #228 - Bubble-in-Liquid Mass Transport 
Phenomena 

General Description of Experiment and Image Content 
A bubble of gas is injected into a liquid under controlled pressure conditions. The 
pressure is adjusted to maintain an unstable equilibrium between the bubble and the 
surrounding liquid. At some point the pressure is increased, and the bubble begins to 
dissolve. 

Information to be Derived from the Video Record 

The precise diameter of the bubble is measured in each image. The diameter 
measurement is resolved to two microns. 

Video System Requirements 

1. 2 views, monochrome, 256 gray levels. 

2. Resolution: 

A. Desired 

Resolve bubble diameter to 0.002 mm in a field of view of 
4 mm x 4 mm (obtained by zooming from full field of 1.5 cm x 1.5 cm) 
- To accomplish this, we need 2000 x 2000 pixels. 

B. Acceptable 

1024 x 1024 pixels (baseline maximum resolution) 

3. Frame rate: A. Injection 1000 fps for 1 s (Desired) 

100 fps for 1 s (Acceptable) 

B. Equilibrium 1 fps for 300 s [480 sec]* 

C. Initiation 1000 fps for 1 s (Desired) 

100 fps for 1 s (Acceptable) 

D. Dissolution 1 fps for 300 s [1800 sec]* 

* from Experiment 
Timeline 

4. Runs per flight: 4 

Required Data Acquisition and Storage 

A. Desired 

( 1000 fr/s )( 2 s ) + ( 1 fr/s )( 2280 s ) = 4280 fr 
( 4280 fr )( 2000 x 2000 p/fr ) = 17,120 Mp 
( 17,120 Mp )(1 Byte/p) = 1.71 x 10 10 B req. storage/view/run 
( 1.71 x 10 10 B )(2 viewsX4 runs) = 1.37 x 10 11 B req. storage/flight 

B. Acceptable 

( 100 fr/s )( 2 s ) + ( 1 fr/s )( 2280 s ) = 2480 fr 
( 2480 fr )( 1024 x 1024 p/fr ) = 2.60 Gp 
( 2.60 Gp )( 1 Byte/p ) = 2.60 x 10 9 B req. storage/view/run 
( 2.60 x 10 9 B )(2 viewsX4 runs) = 2.08 x 10 10 B req. storage/flight 

Baseline Data Acquisition and Storage 

Since the sensor is being used at full resolution, the frame rate is limited to 76.29 
fps (see Background above). 
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( 76.29 fr/s )( 2 s ) + ( 1 fr/s )( 2280 s ) = 2433 fr 
( 2433 fr X 1024 x 1024 p/fr ) = 2.55 Gp 
( 2.55 Gp )( 1 Byte/p ) = 2.55 x 10 9 B storage /view /run 
Since the capacity of the RAM will be 
only 512 Mb, storage on magnetic tape 
will be required. Although the data rate 
of the tape drive will be limited to 
240 Mbps, using the RAM as a buffer will 
allow all of the above data to be captured 
at tiie given frame rates, since the 
76.29 Mb/s capture is for just one second 
at a time. Therefore, the storage in the 
buffer only reaches about 50 Mb before the 
input rate drops to 1 Mb/s, and the buffer 
ran be emptied over the next few seconds, 
as shown in Figure 14. 


RAM in use 



Figure 14: Dynamic RAM buffer capacity for 
Experiment #288 


The storage capacity required for this experiment is 
( 2.55 x 10 9 B X4 runs) = 1.02 x 10 10 B storage /view /flight. 

This total is within the capacity of one reel of magnetic tape. 

Required Downlink Rate 

Approximately 1 fps (almost real time) for up to 2 minutes for ground control 
(position, zoom) of sensor. 

A. Desired. 

( 1 fr/s )( 2000 X 2000 p/fr ) = 4.0 Mp/s 
( 4.0 Mp/s )( 8 bpp ) = 32.0 Mbps 

TJ A ppo'nf oV»1 n 

( 1 fr/s )( 1024 x 1024 p/fr ) = 1.049 Mp/s 
( 1.049 Mp/s )( 8 bpp ) = 8.39 Mbps 

Baseline Downlink Rate 
( 1 fr/s )( 1024 x 1024 p/fr ) = 1.049 Mp/s 
( 1.049 Mp/s )( 8 bpp ) = 8.39 Mbps 

Since the requirements are for almost real time downlink for a short period of 
time, we will assume the downlink is taking place during a period of time in which 
the carrier vehicle (Space Shuttle or SS Freedom) has TDRSS coverage. 

Data Compression Requirements 

The baseline HHVT system almost meets the video image acquisition and storage 
"acceptable" requirements of this experiment without the need for data compression. 
The only exception is the reduction in frame rate from 100 fps to 76.29 fps for the 
injection nnd initiation phases of the experiment. Since the frame rate is limited by 
the pixel addressing rate of the sensor, not much can be done about it. 

The downlink bpp requirement varies greatly depending on the vehicle and the 
number of other experiments sharing the Ku-band link. Most likely, no compression 
will be necessary for the downlink. In the worst case the image might have to be 
compressed to 1.5 Mbps, or 1.43 bpp. Since the only feature of interest in the 
transmitted image is the boundary of the bubble, a reasonable approach to compression 
is to transmit only the edges. This can be done by using an edge detection technique 
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and transmitting only the locations of pixels classified as edges, with or without the 
intensity values. A technique of this type will easily be able to achieve the necessary 
compression ratio which is less than 6:1. 

There are several techniques that could be very useful for this experiment. Part 
of the synthetic highs technique [135] can be used by extracting and transmitting only 
the edge information. The method of Robinson [136] uses two-dimensional directional 
mask operators to extract edges. The edge pixels are then chain coded for transmission. 
Although either method would probably be sufficient for this experiment, the latter 
one is more advanced and will likely produce a better edge picture at a lower bit rate. 
Both of these techniques involve filtering by convolution which uses many 
multiplications per pixel, but high-speed comp re ss ion i s not requ ired for downlink 
transmission because the input rate is only 1.05 Mp/s, Another technique that will 
provide high-speed edge detection involves the intensity-dependent spatial summation 
(IDS) operator [137]. This technique is being implemented in hardware for Langley 
Research Center [138]. 

Based on a comparison of the above techniques and projected development status, 
itis concluded that a two-dimensional directional mask operator or IDS operator should 
be used to extract edge maps for compressed data transmission. 


135. Schreiber, W.F., Knapp, C.F., Kay, N.D., "Synthetic Highs - An Experimental TV Bandwidth 

Reduction System," Journal of SMPTE, vol. 68, pp. 525-537, August 1959. 
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SPIE (also in Optical Engineering), vol. 87, pp. 117-125, 1976. 

137. Comsweet, T.N. and Yellott, Jr., J.I., "Intensity-Dependent Spatial Summation," Journal of the 

Optical Society of America A, vol. 2, pp. 1769-1786, October 1985. 

138. HuckgF.O., "Local Intensity Adaptive Image Coding," NASA Data Compression Workshop, 
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Experiment #230 - Nucleate Pool Boiling 

General Description of Experiment and Image Content 
Freon is heated locally by means of a large current passed through a thm gold 
coating on quartz. At some point the freon begins to boil. Vapor bubbles form, grow, 

and depart from the surface. 

Information to be Derived from the Video Record 

Nudeation site density 
Nudeation frequency at a given site 

Bubble shape . 

Bubble growth, collapse, departure, motion after departure 
Existence of fluid micro-layer underneath bubble 
Video System Requirements 

1. monochrome, 10 - 20 gray levels (we will assume 16 levels since this would 

take full advantage of 4 bpp) „ 

2. Resolution: resolve an object of dimension 0.005 inch in a field °, T Y! ew 
2.5 inch x 5 inch - To accomplish this, we need 500 x 1000 pixels. (We will 

3. Frame rate: First 6 sec (avg.): 1000 fps desired, 100 fps acceptable. Next 
120 sec: 10 fps 

4. Runs per flight: 9 

Required Data Acquisition and Storage 
A Desired 

( 1000 fr/s )( 6 s ) + ( 10 fr/s )( 120 s ) = 7200 fr 
( 7200 fr X 1024 x 512 p/fr ) = 3.77 Gp 
A 16-gray-level image requires 4 bpp (0.5 byte s/pixel). 

( 3.77 Gp )( 0.5 bytes/p ) = 1.89 x 10 9 B req. storage/run 
( 1.89 x 10 9 B )(9 runs) = 1.70 x 10 10 B required storage/flight 

B. Acceptable 

( 100 fr/s X 6 s ) + ( 10 fr/s X 120 s ) = 1800 fr 
( 1800 fr X 1024 x 512 p/fr ) = 944 Mp 
( 944 Mp )( 0.5 bytes/p ) = 4.72xl0 8 B req. storage/run 
( 4.72 x 10 8 B )(9 runs) = 4.25 x 10 9 B required storage/flight 

Baseline Data Acquisition and Storage . 

Sub-imaging will allow the sensor to produce 152.6 fps since each frame is only 

0.524 Mp. 

( 152.6 fr/s )( 6 s ) + ( 10 fr/s )( 120 s ) = 2115 fr 
( 2115 fr X 1024 x 512 p/fr ) = 1109 Mp 
( 1109 Mp X 0.5 bytes/p ) = 5.55xl0 8 B storage per run 


* 
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RAM in use 


Since the capacity of the RAM will be 
only 512 Mb, storage on magnetic tape 
will be required. Although the data rate 
of the tape drive will be limited to 
240 Mbps, using the RAM as a buffer will 
allow all of the above data to be captured 
at the given frame rates, since the 
76.29 Mb/s capture is only required for 
6 seconds. Therefore, the storage in the 
buffer only reaches about 300 Mb before 
the input rate drops to 5 Mb/s, and the 
buffer can be emptied over the next several 
seconds, as shown in Figure 15. 

( 5.55 x 10 8 B )( 9 rims ) = 4.99 x 10 9 B storage/flight 

This total is less than the capacity of one reel of magnetic tape. 
Required Downlink Rate 
No downlink requirements. 

Baseline Downlink Rate 



Figure 15: Dynamic RAM buffer capacity for 
Experiment #230 


No downlink requirements. 

Data Compression Requirements 

The baseline HHVT system meets the video imaging "acceptable" requirements 
of this experiment without the need for data compression. The only area in which 
there is room for improvement is the frame rate. Although the "acceptable" frame rate 
is exceeded, the "desired" frame rate of 1000 fps cannot be approached. Since the frame 
rate is limited by the pixel addressing rate of the sensor, only an optical compression 
technique that is implemented in front of the sensor could help. This technique would 
have to reduce the number of pixels required to achieve the necessary resolution, 
thereby allowing the sensor to trade fewer pixels per frame for a higher frame rate. 
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INTEGRATION OF VDC INTO HHVT SYSTEM 

Since the goal of the HHVT system is to provide as much scientific information 
as possible, video data compression (VDC) should preserve the salient features of the 
image. In addition, the HHVT system must meet storage and downlink limitations. 
Therefore, we need to consider two stages of compression, one before storage on tape 
and one for downlink. Whenever the data acquisition rate is above the magnetic tape 
storage rate (30 Mb/s), the data should be initially compressed only enough so that it 
ran be stored on tape. If further compression is required for downlink transmission, 
a second compression algorithm can be applied. Finally, much of the compression, 
especially before storage, must occur in real- or near-real-time. Hardware 
implementations of VDC algorithms deserve serious consideration. 

Several features of the baseline HHVT system’s design suggest prominent points 
for the integration of VDC. HHVT will employ eight parallel lines out of the sensor. 
This architecture suggests the use of a parallel processing architecture, especially in 
the first stage of compression before storage. If the sensor’s output is a nominal 
80 Mp/s, each line could process at 10 MHz. This is within the capability of current 
hardware video processing technology. 

Some current hardware-based VDC implementations involve predictors like 
DPCM and DM. An predictive algorithm which has much promise for VDC is the 
Lempel-Ziv-Welch algorithm [139][140]. While the algorithm requires buffering for a 
variable output bit rate and a code table [141], it can provide either lossy or lossless 
compression. Good quality images can be reconstructed at 1.24 bpp. Using DCT 
coefficients, the LZW algorithm yields compression ratios of 16:1 [142]. Linear filtering 
(convolutions using DFTs) occurs at high processing speeds through the use of DSPs. 
Also, DSPs have evolved into graphics system processors such as Texas Instruments’ 
TMS34010 and 34020 devices, National Semiconductor’s Advanced Graphics Chip Set, 
the AMD 95C60, the Intel 82786, and Hitachi’s 63484, amongothers. These processors, 
operating at rates up to 60 MHz, are optimized to perform graphic and image 
processing functions. These processors perform block operations, such as matrix 
rotations, in a single operation. They also work with floating point number accuracy. 
Another area of great potential for HHVT is the success with application specific 
integrated circuits or ASICs. ASICs are semiconductor devices, often in GaAs or 
CMOS, which use basic logic elements, special memory cells, adders, and multipliers 
integrated onto a single die or chip [143]. The processing speed of these devices is 
often the fastest for any particular hardware implementation of a VDC algorithm. 
DPCM, DCT, MC, and DFT operations are a few of the algorithms which have been 
successfully implemented. 

The following two sections examine some of the unique VDC requirements of 
HHVT. They also address in more detail the implementation of VDC into the system. 
The first section examines electronic, or hardware, implementations. The second 
section examines the use of optical filtering to improve high spatial frequency response 
and edge preservation in VDC. 


139. Ziv, J. and Lempel, A., "A Universal Algorithm for Sequential Data Compression," IEEE 
Transactions on Information Theory, vol. IT-23, No. 3, pp. 337-343, May 1977. 
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142. Lewis, Jr. H. Garton and Forsyth, William B., "Hybrid LZW Compression," Proceedings of the 

NASA Langley sponsored International Workshop on Visual Information Processing for 
Television ana Telerobotics, Williamsburg, Virginia, May 1989. 
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Electronic Implementations 

The baseline system’s sensor will have an 80Mp/s output rate. Hardware 
implementations are likely to require some form of parallel processing. DPCM using 
a one-dimensional predictor can be implemented in the parallel pipeline of the baseline 
system. Eight processors, each operating at 10 Mp/s, can compress the eight parallel 
bit streams emerging from the sensor. However, as indicated in the discussion of 
predictive methods, vertical edges will be poorly represented by a horizontal predictor. 

It may be possible to implement a two-dimensional predictor for seven out of the 
eight bit streams by delaying each bit stream by one pixel more than the stream above 
it. Doing so will allow the current predictor to use the pixel that was originally directly 
above the current pixel. This also could be implemented at 10 Mp/s per processor. 
Three-dimensional predictive schemes clearly cannot be implemented in the parallel 
pipeline since they require an entire frame to be stored in RAM for use by the predictor. 

The Hadamard transform, although it produces suboptimal compression when 
compared to the DCT, has a great advantage in its speed of implementation. This is 
because the computation of the transform involves only additions without 
multiplications. A pipeline implementation of a two-dimensional Hadamard transform 
can operate at very high rates. The speed of such an implementation is limited by 
whichever is slower: one addition operation or the access time of the RAM. The RAM 
stores the matrix of the intermediate results produced by the first stage of 
one-dimensional transforms. Two one-dimensional transforms determine the 
two-dimensional transform. With today’s high-speed electronics, it should be possible 
to implement a Hadamard transform processor in the parallel pipeline of the HHVT 
system. It would perform an 8 x 8 transform, taking as input 8 groups of 8 parallel 
pixels, and producing 8 groups of 8 transform coefficients. Thus, it could process the 
image at a data rate of 80 Mp/s using the system dock rate of 10 MHz. The delay in 
the line for computation of the transform would be about 10 clock periods (1 ps). 

Two-dimensional DCTs can be performed on parallel data quite easily. However, 
total throughput is usually no higher than "video rate". Therefore, the transforms 
cannot be implemented in the parallel pipeline when the combined data rate is much 
above the conventional "video rates". One possibility for high throughput involves the 
use of an analog CCD device that performs a one-dimensional DCT in 100 ns ( 10 MHz 
rate) using parallel I/O [144]. Use of this device in the parallel pipeline would require 
D/A and A/D conversion. 

An advantage of transform and some other block techniques is that each block 
is processed and compressed independently. Once the video data is stored in RAM, 
many processors can be used in parallel to perform the compression. Therefore, as 
long as the implementation is fast enough that one block can be compressed in the 
time that one frame is acquired, real time processing can be achieved by using the 
appropriate number of processors in parallel. 

Block Truncation Coding is performed on each block independently. Therefore, 
it can be implemented in parallel by using multiple processors. However, the amount 
of compression is not easily adjustable. 
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The HVS compensation schemes do not appear to have hardware 
implementations. However, there are DSP implementations which perform linear 
filtering operations at very high speeds. Compression schemes based on simple edge 
detection and coding would utilize these devices. With the image broken down into a 
few sub-images, multiple processors would perform the edge detection. This has been 
done for a very simple 2x2 edge detector. The high speed GSP chips planned for the 
near future (TMS 34020) should implement more effective filters at reasonable speeds. 
Implementations of the complex, high-compression HVS schemes at high speeds seem 
beyond the capabilities of current hardware technology. 

In most cases the experimenter would probably want a copy of as much 
information as possible for detailed analysis in addition to the compressed, transmitted 
image. Therefore, whenever the data acquisition rate is above the magnetic tape 
storage rate (30 Mb/s), the data should be initially compressed only enough to be stored 
on tape. If further compression is required for downlink transmission, the data ran 
be read into RAM, the image can be reconstructed if necessary, and a second 
compression algorithm can be applied. Therefore, we need to consider two stages of 
compression, one before storage on tape and one for do wnlink. 

The first stage must be able to handle input data rates as high as 80 Mp/s in the 
parallel pipeline configuration. The output rate should be 30 Mb/s which means the 
algorithm must be able to produce a fixed, but adjustable, bit rate in the range of 
3-8 bpp. Within these specifications, we should attempt to minimize degradations 
of the image quality. 

DPCM techniques can run at a high enough rate. However, the simple algori thm s 
do not handle edges well at lower bit rates (even a two-dimensional predictor would 
have to revert to one-dimension for one line out of eight in the pipeline). More complex 
algorithms such as adaptive predictors may provide sufficient image quality. 

Transform coding techniques can produce good quality reconstructed images 
when the bit rate is 3 bpp or above. Even edges are fairly well reproduced. The 
Hadamard transform and DCT are easily implemented in the parallel pipeline at 10 
MHz. 

Of the techniques which produce high-quality images, the fast Hadamard 
transform is the easiest to implement. However, if a non-uniform bit allocation scheme 
is used, the bit rate will vary from line to line in the parallel pipeline. This will require 
a multiplexer that can adjust to the bit allocation scheme. A uniform bit allocation 
scheme is also possible, but it will result in lower image quality. 

The compression for downlink can generally be done at a much lower pixel rate 
than the first stage. Also, the output bit rate will be lower. At lower bit rates, less 
information is preserved. Therefore, it becomes more important to tailor the 
compression technique to the specific type of information which must be preserved. 
As a result, a programmable system may be necessary. This would allow for flexibility 
in the choice of compression method to match the experimenter’s needs. The reduced 
speed requirement should make such a system possible. The needs of most of the 
experimenters can probably be met with a short list of techniques. A good edge 
detection technique will have to be included. 
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Optical Implementations 

Speed is the main advantage of coherent optical transform processing. The entire 
image is transformed at the speed of light. Filtering operations in the transform 
domain are done continuously and are used in a system with any subsequent frame 
rate without synchronization. , 

A coherent light source is required for an optical method which uses Fourier 
transforms of monochromatic images. Since the Fourier transform is independent of 
spatial position, optical Fourier transforms are the most practical for image processing 
tasks such as identification and classification. However, image compression 
algorithms inv olving only spatial-frequency filtering are also easily implemented using 

Fourier optics. . . 

Spatial filtering is used to vary the spatial bandwidth prior to the digitizing 
process. A He-Ne (helium-neon) laser source is used with a dibutyl-phthalate liquid 
gate, mixing with light from the image to produce a coherent light source. A Fourier 
transform lens forms the spatial Fourier transform of the image at the transparency. 
Various obstacles at the focal plane (of this first lens) result in spatial filtering. The 
image is retransformed by a second Fourier transform lens. This spatially filtered 
image, at the focal plane of a video camera, is then digitized and digitally compressed 

This filter ing process has been used with Delta Modulation (DM) by Eichmann, 
et al [146][147]. A computer simulation also verifies significant compression when 
this tec hni que is used in conjunction with a DCT encoder. The optical technique 
reduces high frequency distortions which manifest themselves as edge blockiness [ 148] . 
Consequently, this focal plane process increases the resolvable detail in the compressed 
image. It serves a function similar to pre-emphasis employed in broadcast television. 

Even if the images are not produced with coherent light, coherent optical 
processing can be performed through the use of optically accessible spatial light 
modulators. Incoherent light initializes a modulator, which converts the light to a 
coherent beam of light. This coherent light is proportional in intensity to the original 
image produced with incoherent light. The result is an equivalent image consisting 
of coherent light, which can then be processed using Fourier optics. 

Pockel’s Readout Optical Modulator (PROM) is one device that can modulate 
spatial light at and above video rates. It has a maximum resolution of 500 lines/mm 
and is typically 25-30 mm in diameter. PROM can obtain a 10,000:1 contrast ratio. 
However, PROM requires blue light for writing and red light for reading. In situations 
where image information is contained in lower frequency light, such as a flame 
experiment where image information occurs in the red and infrared bands, input could 
become a problem. 

Optical techniques may be well suited to certain applications. However, they are 
not easy to implement and may be inappropriate for all situations where a coherent 
light source is not available. Although a modulator can be used to convert an incoherent 
light source to a coherent light beam, the spectra of light from certain experiments 
studied in this report exceed the useful range of the modulator. It may be feasible to 
integrate this technique into some experiments using incoherent light sources within 
the operating range of the modulator. Because of this, caution is required when 
considering a general application of optical, spatial filtering. 
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CONCLUSIONS 

The following conclusions were developed while studying the literature: 

1. Three experiments requirements were studied in detail. The three 
experiments studied were Solid Surface Combustion, Bubble-in-Liquid Mass 
Transport Phenomena, and Nucleate Pool Boiling. The results of the investigation 
conclude that video data compression approaches for microgravity space experiments 
are experiment peculiar in requirements and no one single approach is universally 
optimum. 

2. It is shown, for the experiments studied, required data compression is 
separable into two approaches, the first to limit data rates for storage, and the second 
to reduce data rates for transmission. 

3. Hardware implementations for high resolution and/or high frame rate 
experiment requirements, and real time compression are currently limited, by 
technology, to methods that can be implemented using parallel processing, digital 
filtering, decomposition, and tree searches. 

4. In general, based on this survey and the state of the art in image coding, 
transform algorithms are preferred over predictive methods. Of the transform 
methods, the Discrete Cosine Transform is the optimal method. It provides the most 
efficient compression. 

5. For HHVT applications, the DCT is the one method which can best meet the 
stringent compression requirements, and maintain image fidelity. Several fast 
algorithms been developed [149][150]. These algorithms perform at least six times 
faster than the Fast Fourier Transform, and these algorithms directly lend themselves 
to hardware implementations. 

6. Coupled with a motion compensation or block matching algorithm in the 
temporal direction, the DCT is currently the best method for high compression, 
interframe coding. 

7. Although theoretically attractive, no approach could be identified for focal 
plane processing alone, that could be implemented with state of art hardware. Still, 
optical techniques are advantageous when used with digital compression to help 
maintain edges and high frequency detail. 


149. Chen, W., Smith, C. H., and Fralick, S., "A Fast Computational Algorithm for the Discrete 
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150. Lee, B. G., "A New Algorithm to Compute the Discrete Cosine Transform," IEEE Transactions 

on Acoustics, Speech, and Signal Processing, vol. ASSP-32, pp. 1243-1245, 1984. 
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RECOMMENDATIONS FOR FURTHER STUDY 

1. User controlled, dynamic image processing which provides image 
enhancement, reconstruction, and manipulation. These techniques could be used to 
offset the perceived degradations in compressed images. They will also aid 
investigators in the analysis of video data. Such techniques also provide needed 
functions in a telescience environment. This development will help to make the HHVT 
system a more useful, and easier to use, system. Investigators can use these methods 
to control the operation of the HHVT system. 

2. A detailed study of experimenters’ requirements. 

3. A study to define specific spectral requirements. 

4. Continued study of promising compression techniques. For example, at the 
time of this report, great advances are being made of fractals, and other shape 
matching, algorithms. These techniques provide several orders of magnitude 
improvement in compression. Also, current Human Visual System approaches will 
attempt to identify crucial elements to maintain the perception of a high fidelity 
image. These techniques, however, need further refinement of both algorithms and 
implementations to be useful in the HHVT system. 

5. Further study of optical processing techniques for high data rates. We 
addressed the use of optical techniques primarily as a pre-processing step to digital 
encoding of the image. Implementation of purely optical techniques has been 
difficult and costly. In the past, such systems’ performances have been 
disappointing. Still, since images ultimately start with light, an optical process 
seems to be the ideal method for image compression. Recent literature indicates a 
renewed effort in this area. Of special interest is the renewed effort to model and 
develop focal plane processors based upon human perception, neural responses, and 
pattern matching. 

6. Development of standard methods to evaluate and compare digital image 
compression algorithms. Such standards should be "blind" to specific operating 
system and computer hardware advantages. Ideally, all algorithms would be tested 
in hardware "in situ". However, associated development time precludes this, and 
one should consider software-based benchmark comparisons. Also, one should 
respect the proprietary concerns of algorithm developers. 
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APPENDIX 1: BIBLIOGRAPHY DATABASE 


The bibliography, resulting from the Task 2.0 study, was prepared using R:BASE 
System V software by Microrim Inc. for the MS-DOS operating system. The database 
files, the applications files, and the program files are contained on the disk provided 
with the this report and represents the required deliverable for this study. 

The R:BASE files containing the list of articles are ARTICLE7.RBF. The R:BASE 
application for entering articles and printing the bibliography list is contained in 
IOIO.APP, IOIO.API, and IOIO.APX. 

Load R:BASE. At the R:BASE Main Menu select 

(1) R : BASE command mode. 


The program can be run by typing run ioio in ioio.apx from the R> prompt. The program 
UBed by ioio is REFLIST. The program can be run using menus. The initial menu is 
shown below. 


=HHVT Bibliography 


(1) Enter new articles. 

(2) Create reference list and store to file* 

(3) Exit to Rbase. 


Figure 16: HHVT bibliography menu 
Highlight (1) Enter new articles. Press CE111MO- 


The next menu to appear selects the appropriate list. 


f ~ ; : 1 

gsaBBSBSg r '"~ k ‘ 1 4 • 


1 -niiivn. j. 4.S t ■ ,...■■■■ 

JiDC non -VDC 



Figure 17: List selection menu 
Highlight VDC . Press ( Enter ) . 
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Next, a screen will show where you can add or list data entries in the Bibliography 
Database. The database contains the following information for each article: 

Reference Number 
Author (s) 

Title 

Journal 

Volume 

Pages 

Date 

Category 

Method 


Proas [E&C) for the menu 

Video Data Compression Bibi+i 

Ref. ti _ ;L;\ ; 

Author: 

Titlo? :• :!!■!?!: ; .. " 

Journal : • • \ Jj ■ ' ’ ; : :p ; ; j;; ! :.‘j 

VOi * : ; ‘ : : i v!: x’?:':' j • ; V V ; ■ 

PP- * ■ • :-V. • 

Date: . : : ‘ : •/%»> : ; ; • ! \ . 

Category: ■ : = i ;.v: . • . J ■ . 

Method: • iiilPiiilii* 


{ ESC} Done ' ' ' ' ' { fi) ’ Clear f ieM. : j 

Form: bibliog Table: bibiiog 


|Shift“F2) Clear to end MM. 

Field: refnum Page: 1 


Figure 18: Bibliography data input listing prompts 

Each article is identified using descriptors for the general groupings into 
categories, and a descriptor for the compression methods used in the article. The 
category descriptors used are 

Block 

HVS 

Hybrid 

Implementation 

Predictive 

Reversible 

Survey 

Transform 
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The menu across the bottom of the screen displays key strokes which modify the 
datab ase fil es. When you have completed making changes or additions to the database, 
press (Esc) . The following menu appears across the screen top. 


- pnplicaie. Edit agrain : ; 'DiBcard : 0u£fc 


Ref. I : . | 

Title: 
Journal: 
voi v * ; 

pp- 

Date? ' 

Catenary: 

Method: 


Pirach, Peter 

VLSI Implantation Tor Viauel Cqaauaication* 
SPIE Cambridge Syrap, Opt £ Opt ^= ; = 

Cre 

Wov 1*8* 

T a pi e me ntet Ion 
DPCM, DC? r 


[ESC] pone : |^; 
Form: bibliog 


'WsM WBMWM 

Table: bib 1 log 


($blft-F£‘) Clear to end’ ' ’ {Shift-FlOJ Mor©’ 
Field: method Page: 1 Changed 


Figure 19: Completed example of bibliography data input 

Highli ght Add . Press ( <-* Enter ] . This enters the changes to the database. 
Press (Esc) three times to return to the HHVT Bibliography menu. 
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At this screen, in Figure 21, enter the name of the file where R:BASE will store 
the ASCII text file. In the example shown, the file is named BIBLIOG . TXT . This 
file can be used in a word processor to prepare and print out a 
bibliography listing. 

The following m enu demons trates how to exit from the Database program. Select 
item (3) and press ( ♦- Enter ) . At the R> prompt, type exit. This is shown in 
the following figure. Again, the DOS prompt will appear. 


j R>exi t**: 


-HHVT Bibliography^ 


{t) Enter ccw articles* 

U) Create reference list and store to file. 
(3) Exit to ftbase* 




Figure 22: Exiting the Bibliography Database 


The database can be queried using R:BASE to manipulate the data, and any 
sorted and/or selected subset of the bibliography can be presented. For detailed 
information on how to query the database, refer to the R:BASE manuals. 
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APPENDIX 2: COMPRESSION PROGRAM 

Three computer programs were written to aid in the comparative evaluation of 
video compression algorithms. The programs were written using QuickBASIC 4.0 by 
Microsoft. The first of the three, 2D_TRNSF.BAS, is used to evaluate the Cosine, Fast 
Walsh-Hadamard, and Slant transform algorithms. It is based on an original image 
n ning 8 bpp. The 2D_.TRNSF.BAS can be used for comparing results of transform 
compression algorithms by varying the following parameters: 

1. Image size, (Random, 8 x 8, 16 x 16, 32 x 32, 64 x 64, and 128 x 128; Bubble, 

8x8, and 16 x 16; Edge, 8x8) 

2. Block size, (8 x 8, and 16 x 16) 

3. Image, (Random, Bubble, and Edge) 

4. Transform algorithm, (DCT, WHT, SLANT, No Transform) 

5. Compression, (3, 1, 0.5 bpp) 

6. Quantization scheme, (Linear, Max/Gaussian) 

The program 2D_TRNSF is listed in the following section. There are two program 
files for 2D_TRNSF, an executable compiled version of the program which can be run 
by typing 2D_TRNSF at the DOS prompt, and a file which can be run using 
QuickBASIC. The program execution, input and output, and user interface are the 
same. Use of the program will be illustrated using QuickBASIC (for detailed 
information refer to the QuickBASIC users’ manual). 

The QuickBASIC screen is shown in Figure 23. 
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The QuickBASIC menu used to open the program is shown in Figure 24. Open 
Program is selected using the cursor movement keys, or using a mouse. 



Figure 24: QuickBASIC menu, OPEN PROGRAM 

A listing of files will be shown as illustrated in Figure 25. The selected file is 
typed, or placed in the File Name : box and entered. 


Edit View Search Run Debug Calls Options 
: * rr~ * ^ Untitled 


Open Program 


File Name: 


C:\QB45 


2D TRNSF.BAS 


Files 


BATCH .BAS 
BBP3.BAS 
BOOT 2 . BAS 
BOOTCGA . BAS 
CAL . BAS 
CHECK. BAS 
COLORS . BAS 
CONTOUR. BAS 

m 


: CONTOUR2 . BAS 
CONTOUR3 . BAS 
COSINE. BAS 
CQ1-D.BAS 
CQGAUSS.BAS 
CQL INEAR. BAS 
DATCHEK . BAS 
DATES. BAS 
DCT.BAS 


DCTDPCM. BAS 
DECPLOT.BAS 
DEMOl.BAS 
DEM02.BAS 
DEMO 3. BAS 
EDGEDET . BAS 
EPSON. BAS 
FASTWHT . BAS 
FEEDS. BAS 


Dirs/Drives 



'< OK >■ 


< Cancel > 


< Help > 


Help 

•H T h). 


fi“ H olp Enter-Execute Esc-Cancel Tab-Next Field Arrow-Next Item 


Figure 25: 2DJTRNSF.BAS program selected from list 
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ORIGINAL PAGE IS 

OF POOR QUALITY 


The selected program will be listed as shown in Figure 26. 


File Edit View Search 


Run Debug Calls Options 


DECtARE Stlfc plot x!0, igrthm** . .. 

DfcCfcAR£ sy& transform tel, itsi{) r tW'Pr XfOxW%r ! >; 

DRCtARE FUNCTXQtf *iuro&e*nBiianfcs$ (n%) 

DSC&AR& SU& stats i$M>f fM*# dcJ r blttotal%) 

d£C£aRE function Xittquantf (*f f K%) 

DECLARE Fwctioa 4eX t 0 : ; . &• 

DECtARE Stf& picture imiige%UJ 

^$INCtUDE; 'C0$XNE,&J' 

: ;• . 

' $isctUD£? > HAtw^E Ml r •; •,; • 

*$tucb\}t>z\ *cq.w\'\ . ' . - ; E 

* This pt0 graft demonstrates the effects of Xftatfe eompress.ibft t*?4 n 3 

* EyO-<Umertsipn*X transform techniques , 

4 bitten by Haxc £< Neuetadte^r Analex Co^pbratiebr i98$ + 


Help 
rr^j T 



<Shlf t +H-Help> <f 6-Window> <F2 -Subs> <F5-Run> <F8-Step> j N P.9.° : 9L- 0 °.L • 


Figure 26: QuickBASIC program, 2D_TRNSF.BAS, loaded 


The program is run by selecting Run, andfromthe Run menu selecting Start, 
shown in Figure 27. 
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Make EXE File. 
Make Jtibrary.. 


DECLARE SOB plot (n%, *!<> 

DECLARE SIlB Sirin? form <n%, 

DECLARE FUNCTION NumRemBla 
DECLARE SOB stats. Jn*, 12! 
j DECLARE FUNCTION linquant t 
DECLARE FUNCTION del! () 

DECLARE k>B picture <ima|e 
SINCLUDE; ■'C0SI8E.BI' V: 

' $ INCLUDE! ' F ASTWKT . Bl ' i : i ; 1 
^INCLUDE: ' HALFTONE »B I* 

SINCLUDE; ’CQ..BI' T/ji ; - ! : 

S INCLUDE: * SLANT. SI' ||||| 

This program demonstrates the effects of Image compression using' 
two-dimensional transform techniques. 

Written by Marc S. Neustadter, Aneies Corporation, 1986. • 
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Figure 27: RUN PROGRAM menu 
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The program will prompt the user for inputs. The program prompts with example 
responses are shown in the following: 


Enter the image size: 16 
Enter the block size: 16 


Is the image in a file? N 

If the last input had been Y, the program would prompt for an input file name. 
The program at this point produces a 16 x 16 image of pixels having a random 
distribution of intensities, where the intensities are based on a first order Markov 
process. The program continues with the following prompts: 

Which transform would you like to use? 

1- Discrete Cosine Transform (DCT) 

2. Walsh-Hadamard Transform (WHT) [8 or 16 only] 

3. Slant Transform [8 only] 

4* No Transform 

5 . Exit 

Enter the number of your choice: 1 

The next set of program prompts will be: 

Which bit allocation would you like to use? 

It 3 bpp, uniformly distributed 

2 . 3 bpp, sub-optimally distributed 

3. 1 bpp, sub-optimally distributed 

4. 0.5 bpp, sub-optimally distributed 

5 . Exit 

Enter the number of your choice: 1 

This selection sets the compression, from 8 bpp to 3 bpp. Finally the program prompts 
will be: 

Which quantization scheme would you like to use? 

A. Linear (uniform) quantization 

B. Max quantizer based on Gaussian distribution 

Enter the letter of your choice: B 

This completes the requests for inputs. The program in response to this set of 
inputs will produce an output display as shown in Figure 28. The output display 
contains a vertical column in the upper left of the figure. This column is used to show 
the pixel intensity level key. The four square blocks, proceeding clockwise from upper 
left, are 
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ORIGINAL PAGE IS 
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Figure 28: Program output for a random pattern 

Upper left: First block, 16 x 16, of selected image 

Upper right: 16 x 16 block of transform coefficients. The coefficients are 

ordered by frequency, with the lowest frequency term (DC) in 
lower right comer, increasing moving up, and increasing 
moving to the left. 

Lower right: The truncated set of transform coefficients. 

Lower left: The image, reconstructed from the truncated set of transform 
coefficients. 

The table in the lower left of the figure lists the following results: 


bpp = 3.00 (The average number of bits per pixel in the compressed image.) 

MSE = 0.125 % (The Mean Square Error for the compressed block.) 

may 0.125 % (Maximum average block error. Used where the block size 
is smaller than the image size, requiring more than one 
block to complete the image.) 

avg 0.125 % (Combined average error for the number of blocks in the 
image. Used where the block size is smaller than the 
image, requiring more than one block to complete the 
image.) 

NSE = 9.09% (Normalized Mean Square Error. Error Normalized with 
respect to the square of the maximum intensity level.) 
max 9.09 % (Maximum NSE for block. Used where the block size is 
smaller than the image.) 
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avg 9.09 % (Combined average NSE for the number of blocks used in 
the image, where the block size is smaller than the 
image.) 

Figure 29 shows the output display for a stored image of a bubble superimposed 
on a background of random pattern of intensities. The input, in response to the program 
prompt, is shown below: 


Is the image in a file Y 
Enter the filename: BUBBLEJMG 


n 


9H 

7//j 


bpp = 3.000 

USE = 0.308 2 
nax 0.308 2 
avg 0.308 2 

NSE = 12.29 2 
sax 12.28 2 
avg 12.28 2 


m 

f /m 


Vi! 





Zh 


vm 



Figure 29: Program output for a bubble pattern 

Figures 30 and 31 show successive output displays for the file BUBBLE.IMG, 
where the image size is 16 x 16 and the block size is 8 x 8. There are 4 blocks required 
to complete the 16 x 16 image. The first block is shown in Figure 30. Three successive 
displays for the three remaining blocks are obtained by depressing the space bar. Note 
the change in error statistics between the two figures. 


ORIGINAL PAGE IS 
OF POOR QUALITY 
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Figure 30: Program output for a bubble pattern using an 8 x 8 block DCT 


bpp = 3. 


USE = 
nax 
avg 


NSE 


nax 

avg 


0.194 2 
0.194 2 
0.168 2 

7.94 2 
7.94 2 
7.35 2 




Figure 31: Program output for the second quarter of bubble pattern image using an 8 x 8 block DCT 
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Figure 32 shows the output display for the file EDGE1508.IMG. The display is 
for an image size of 8 x 8; a block size of 8 x 8; using the DCT, 3 bpp, and the Max 
quantizer. 
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Figure 32: Program output for an edge pattern using an 8 x 8 block DCT 


Figure 33 shows the File menu, selecting Exit ends operation of the program. 
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Figure 33: File menu, EXIT 
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The second program for evaluation of video compression is used to evaluate the 
use of one dimensional transforms with DPCM. There are two program files, 
XFKMDPCM.BAS which can be run using QuickBASIC, and an executable file which 
can be run by typing xfrmdpcm at the DOS prompt. The program will operate with 
image sizes of 8 x 8 and 16 x 16. The block size is restricted to 8 x 8. 

The third program, EDGEDET.BAS, is a simple edge detection algorithm. It 
operates with an image size of 16 x 16 using BUBBLE.IMG, and EDGE .IMG using 
an image size of 8 x 8. There is not an executable file for this program, therefore it 
must be run using QuickBASIC. 

The enclosed disk contains all of the files, listed in Figure 34, in a directory named 
QBFILES. 
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2D TRNSF.BAS " 
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2D TRNSF.EXE 

2D TRNSF.MAK 
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02/06/89 
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3B16F f .DAT | 

3B8E .DAT 

... 138 

11/18/88: 

If: 11 

3B8EH .DAT r 

3B8F ,DAT 

138 : 

"ii/le/ir 

Will . 

3B8FH .DAT : 

BUBBLE .IMG 

1263 

11/28/88 16:16 

' COSINE iBA$ j: 

COSINE ,BI 

158 

il/29/88 18; 19 

CQ .BI i 

Coi-D .BAS 

2695 

11/22/88 12:02 

cqi-d ;.bi : : 

CQGAUSS .BAS 

1 ' 345 1' 

02/06/89 16:43 

CGLI NEAR. BAS 

DCT .BAS ' 

1 2462 

12/13/88 

10:43 

DCTDPCM .BAS 

dctdpCm ,B1 . : 

: 1 : * 115 

mmm 

11:37 

EDGE1508.IMG 

EDGe50_8 . IMG 

i/..: 320 

12/07/88 

09:08 

EDGEDET .BAS 

FASTWHT .BAS ; 

1757 

02/06/89 

16:43 

: FASTWHT .BI 

HADAMARD , BAS 

1112080.. 

ii?l6/88 

16:52 

HADAmARDIBI 

HALFTONE , BAS / 

2706 

02/06/89 

16:43 . 

HALFTONE. BI :: 

IMAGE : .BAS 

.11 3304 

12/07/88 

09:09 .’ 

r ORIGINAL. IMG ; 

QBFILES ,$CT 

1 : 2896 

02/13/89 

;.ii::i4 
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12 / 02/88 
12 / 02 / 88 . 
11/16/88 
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Figure 34: List of files in QBFILES directory 

The files include executable files for 2D.TRNSF and XFRMDPCM, and 
corresponding files which can be run in QuickBASIC. In addition, there is the e 
EDGEDET.BAS which can be run in QuickBASIC. The files also include all of the 
necessary FILENAME .BAS subroutine files, IMAGEFILE.IMG files for the bubble 
and edge images, XFRM.BI files for the transforms, and the TRUNC.DAT files used 
in compressing the transform coefficient arrays. 
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2D TRNSF.BAS 


DECLARE SUB plot (n%, x!(), lgrthm%) 

DECLARE SUB transform (n%, its! Or c!(), LUT!(), Xform%, dir$) 

DECLARE FUNCTION NumRemBlanksS (n%) 

DECLARE SUB stats (n%, i2!(), i!(), dc!, bittotal%) 

DECLARE FUNCTION linquant ! (x!, K%) 

DECLARE FUNCTION del! () 

DECLARE SUB picture (imagesiz%, image%()) 

' $ INCLUDE : 'COSINE.BI' 

'SINCLUDE: 'FASTWHT.BI' 

'SINCLUDE: ' HALFTONE . BI' 

' SINCLUDE : 'CQ.BI' 

'SINCLUDE: ' SLANT. BI' 

' This program demonstrates the effects of image compression using 
' two-dimensional transform techniques. 

' Written by Marc S. Neustadter, Ana lex Corporation, 1988. 

DIM SHARED tile$ (15) 

' This statement should be removed to produce the identical image every time. 
RANDOMIZE TIMER 

' This subroutine call produces the tiles for the gray scale. 

' See the documentation on the subroutine 'halftone' for details. 

CALL halftone (tile$ ()) 

' This section prompts for the image size (# of pixels in each dimension) 

' and the transform block size (# of pixels) . The arrays are then 
' dimensioned appropriately, 
getimg: SCREEN , , 1, 1 
CLS 0 

LOCATE 7, 10: INPUT "Enter the image size: ", imagesiz% 

LOCATE 8, 10: INPUT "Enter the block size: ", n% ' The block size 

is n% x n% . 

REDIM image% (images iz% - 1, imagesiz% - 1), B% (n% - 1, n% - 1) , 
image2% (imagesiz% - 1, imagesiz% - 1) 

REDIM intensity (n% - 1, n% - 1) , coef f (n% - 1, n% - 1) 

REDIM intensity2 (n% - 1, n% - 1) , LUT(n% - 1, n% - 1) , ilut <n% - 1, n% 

- 1 ) 

' This section offers the choice of using a previously stored image 
' or producing a new one. 

LOCATE 10, 10: INPUT "Is the image in a file"; YorN$ 

IF UCASE$ (LEFTS (YorN$, 1) ) - "Y" THEN 

LOCATE 11, 10: INPUT "Enter the filename: ”, imfileS 
OPEN imfileS FOR INPUT AS #3 
FOR i ■ 0 TO imagesiz% - 1 
FOR j = 0 TO imagesiz% - 1 
INPUT #3, image % (i, j) 

NEXT j 
NEXT i 
CLOSE #3 
ELSE 

CALL picture (imagesiz%, image%()) ' Form the original image. 

END IF 
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menu: CLS 0 , 

LOCATE 6, 10: PRINT "Which transform would you like to use? 

LOCATE 7, 15: PRINT "1. Discrete Cosine Transform (DCT) " 

LOCATE 8, 15: PRINT "2. Walsh-Hadamard Transform (WHT) [8 or 16 only]" 
LOCATE 9, 15: PRINT "3. Slant Transform [8 only]" 

LOCATE 10, 15: PRINT "4. No Transform" 

LOCATE 11, 15: PRINT "5. Exit" 

LOCATE 13, 10: INPUT "Enter the number of your choice: ", Xform% 

CLS 0 

LOCATE 6, 10: PRINT "Which bit allocation would you like to use?" 

LOCATE 7, 15: PRINT "1. 3 bpp, uniformly distributed" 

LOCATE 8, 15: PRINT "2. 3 bpp, sub-optimally distributed" 

LOCATE 9, 15: PRINT "3. 1 bpp, sub-optimally distributed" 

LOCATE 10, 15: PRINT "4. 0.5 bpp, sub-optimally distributed" 

LOCATE 11, 15: PRINT "5. Exit" 

selal: PRINT : LOCATE , 10: INPUT "Enter the number of your choice: ", alloc% 

' The files of bit allocations must exist where they can be found. 

SELECT CASE alloc% 

CASE 1 

file$ = "3b" + NumRemBlanksS (n%) + "e" 


CASE 2 
fileS 
CASE 3 
fileS 
CASE 4 
fileS 


"3b" + NumRemBlanksS (n%) + "f" 
"lb" + NumRemBlanksS (n%) + "f" 

"l%2b" + NumRemBlanksS (n%) + "f" 


CASE 5 


GOTO getout 
CASE ELSE 

PRINT "Invalid choice" 

GOTO selal 
END SELECT 
bittotal% * 0 
ON ERROR GOTO Handler 
• Loads the bit allocation. 

OPEN files + ".dat" FOR INPUT AS #1 
ON ERROR GOTO 0 
FOR i = 0 TO n% - 1 
FOR j = 0 TO n% - 1 
INPUT #1, B% (i, j) 
bittotal% - bittotal% + B%(i, j) 

NEXT j 
NEXT i 
CLOSE #1 

LOCATE 13, 10: PRINT "Which quantization scheme would you like to 

use?" 

LOCATE 14, 15: PRINT "A. Linear (uniform) quantization" 

LOCATE 15, 15: PRINT "B. Max quantizer based on Gaussian distribution" 
PRINT : LOCATE 17, 10: INPUT "Enter the letter of your choice: ", 

quantS 

quantS = UCASES (quantS) 

' Initializes the display. 

SCREEN 8, , 2, 2 * EGA 640 x 200 with tiling. 

CLS 0 

WINDOW 

VIEW 
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' This loop puts the gray scale on the screen. 

FOR K - 0 TO 15 

LINE (52, 7 * K) - (68, 7 * K + 7), 3, B 
PAINT (59, 7 * K + 5), tile$ (K) , 3 
NEXT K 

WINDOW (0, 0) - (n%, n%) 

FOR K = 0 TO imagesiz% \ n% - 1 
FOR L » 0 TO imagesiz% \ n% - 1 

FOR i ** 0 TO n% - 1 ' Get the current block. 

FOR j - 0 TO n% - 1 

intensity (i, j) = image % (K * n% + i, L * n% + j) / 256 
NEXT j 
NEXT i 

VIEW (140, 96) - (333, 0) 

CALL plot (n%, intensity () , 0) ' Plot the block. 

' This section calls the subroutines to produce look-up tables for the 
' forward and inverse transforms. The fast WHT doesn't need a look-up table. 
SELECT CASE Xform% 

CASE 1 

CALL mklutcos(n%, LUT(), ilut()) 

CASE 2 
CASE 3 

CALL mklut3lant(n%, LUT<), ilut())‘ 

CASE 4 

FOR i - 0 TO n% - 1 
FOR j - 0 TO n% - 1 

intensity2 (i, j) = linquant (intensity (i, j), B% (i, j) ) 

NEXT j 
NEXT i 
GOTO recon 
CASE ELSE 
GOTO getout 
END SELECT 

' Transform the block. 

CALL transform (n%, intensityO, coeff(), LUT(), Xform%, "for") 

VIEW (407, 96) - (600, 0) 

CALL plot (n%, coeff(), 1) ' Plot the coefficients. 

IF quant$ - "B" THEN 

CALL cquant (n%, coeff(), B%()) ' Quantize the coefficients. 

ELSE 

CALL clquant(n%, coeff(), B%()) 

END IF 

VIEW (407, 199) - ( 600, 103) 

CALL plot (n%, coeff(), 1) ' Plot the quantized coeffs. 

' Inverse transform 

CALL transform (n%, coeff(), intensity2 () , ilut(), Xform%, "inv") 

FOR i “ 0 TO n% - 1 ' Quantize the reconstructed 

FOR j = 0 TO n% - 1 ' image to 8 bits. 

intensity2 (i, j) - linquant (intensity2 (i, j), 8) 

NEXT j 
NEXT i 
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recon : 


VIEW (140, 199) -(333, 103) 
CALL plot (n%, intensity2 () , 0) 


r 


block . 

CALL stats (n%, intensity2 { ) , 
bittotal%) 


Plot the reconstructed 
intensity (), coeff<0, 0) / n%. 


Stores the current block in the reconstructed image. 

FOR i - 0 TO n% - 1 
FOR j - 0 TO n% - 1 

image2% (K * n% + i, L * n% + j) * intensity2 (i, j) * 256 
NEXT j 
NEXT i 
DO 

LOOP WHILE INKEY$ = "" 

NEXT L: NEXT K 
SCREEN , , 1, 1 
CLS 0 

LOCATE 7, 15: PRINT "1. Another transform on the same image" 
LOCATE 8, 15: PRINT "2. A new image" 

LOCATE 9, 15: PRINT "3. Exit" 

LOCATE 11, 10: INPUT "Enter the number of your choice: ", action! 
SELECT CASE action! 

CASE 1 

GOTO menu 
CASE 2 

GOTO getimg 
CASE ELSE 

OPEN "original. img" FOR OUTPUT AS #2 
FOR i “ 0 TO imagesiz! - 1 
FOR j = 0 TO imagesiz! - 1 
WRITE #2, image! (i, j) 

NEXT j 
NEXT i 
CLOSE #2 

OPEN "received. img" FOR OUTPUT AS #2 
FOR i = 0 TO imagesiz! - 1 
FOR j * 0 TO imagesiz! - 1 
WRITE #2, image2!(i, j) 

NEXT j 
NEXT i 
CLOSE #2 
SCREEN 0 
END SELECT 
getout : END 

Handler: 'Error handling routine, 

errnum = ERR 

IF errnum = 53 THEN 'file not found 
CLOSE #1 

PRINT "This choice is not available for the current block size." 
PRINT "Please make another choice." 

RESUME selal 
ELSE 

ERROR errnum 
END IF 
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FUNCTION del 


' This function returns a value to be used as 
' the increment between succesive pixels. 

' xt approximates a Gaussian distribution, 
r - 2 * RND(l) - 1 
SELECT CASE 10 * RND(l) 

CASE IS <- 1# 
d - .0315 * r 
CASE 1# TO 2* 

d - .0318 * r + SGN (r) * .0315 
CASE 2# TO 3# 

d - .0332 * r + SGN (r) * .0633 
CASE 3# TO 4# 

d - .0345 * r + SGN (r) * .0965 
CASE 4# TO 5# 

d - .038 * r + SGN (r) * .131 
CASE 5# TO 6# 

d - .042 * r + SGN (r) * .169 
CASE 6# TO 6.5* 

d - .023 * r + SGN (r) * .211 
CASE 6.5# TO 7# 

d - .025 * r + SGN(r) * .234 
CASE 7# TO 7.5# 

d « .029 * r + SGN (r) * .259 
CASE 7.5# TO 8# 

d = .032 * r + SGN (r) * .288 
CASE 8# TO 8.5# 

d - .04 * r + SGN (r) * .32 
CASE 8.5# TO 9# 

d - .051 * r + SGN (r) * .36 
CASE 9# TO 9.5# 

d - .079 * r + SGN (r) * .411 
CASE 9.5 TO 9.75 

d - .07 * r + SGN (r) * .49 
CASE 9.75 TO 9.875 

d - .065 * r + SGN (r) * .56 
CASE ELSE 

d - .375 * r + SGN (r) * .625 
END SELECT 

' Adjust for the amount of variation desired in the image, 
del = d * 256 / 4 
END FUNCTION 

FUNCTION linquant (x, K%) 

• This function returns the linearly quantized (to K% bits) value of x. 

LS = 2 A K% 

linquant - FIX(L& * x) / L& 

END FUNCTION 

FUNCTION NumRemBlanksS (n%) 

' This function converts a number to a string with all blanks stripped off . 
num$ = STR$(n%) 

NumRemBlanksS - LTRIM$ (RTRIM$ <num$) ) 

END FUNCTION 

SUB picture (imagesiz%, image%0) 

' This subroutine produces the input image. 
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9 First column 

image%(0, 0) - INT(256 * RND (1) ) 

FOR j - 1 TO imagesiz% - 1 
tO: temp = image%(0, j - 1) + del 

IF temp < 0 OR temp >= 256 THEN GOTO tO 
image%(0, j) = INT(temp) 

NEXT j 

9 All the rest 

FOR i * 1 TO imagesiz% - 1 

image%(i, 0) = (image% (i - 1, 0) + image% (i - 1, 1)) \ 2 
FOR j - 1 TO imagesiz% - 1 

t: temp = (image%(i - 1, j) + image%(i, j - 1) ) / 2 + del 

IF temp < 0 OR temp >=256 THEN GOTO t 
image%(i, j) = INT(temp) 

NEXT j 
NEXT i 

END SUB 

SUB plot (n%, x{), lgrthm%) 

9 This subroutine plots the array x on the screen. 

9 lgrthm% should be 1 for a log plot, otherwise the plot will be linear. 

DIM y% (n% - 1, n% - 1) 

PALETTE 4, 4 9 Plot the outlines in red (non-gray) . 

FOR i * 0 TO n% - 1 
FOR j - 0 TO n% - 1 
IF lgrthml = 1 THEN 

y%(i, j) = INT (LOG (ABS (x (i, j) + IE-20)) / LOG(1.5) + 16) 9 The argument 

of the second log function is the base of the log. 

ELSE 

y% (if j) - INT (x (i, j) * 16) 

END IF 

IF y% (i, j) < 0 THEN y% (i, j) = 0 
IF y% (i, j) >= 16 THEN y% (i, j) = 15 

LINE (j, i) - ( j + 1, i + 1) , 4, B 9 Outline the box & tile it. 

PAINT (j + .5, i + .5), tile$ (y% (i, j)), 4 
NEXT j 
NEXT i 

PALETTE 4, 0 9 Change the outlines to black. 

END SUB 

SUB stats (n%, i2(), i(), dc, bittotall) STATIC 
9 This subroutine computes and displays the statistics 
9 for evaluation of the compression method. 

9 bpp * bits per pixel 
9 MSE = mean square error 

9 NSE = normalized square error (square error / AC energy) 
msen = 0 : msed - 0 
count = count + 1 

FOR i = 0 TO n% - 1 9 Compute the MSE . 

FOR j = 0 TO n% - 1 

msen = msen + (i2(i, j) - i(i, j)) A 2 

msed = msed + (i(i, j) - dc) A 2 

NEXT j 
NEXT i 

VIEW PRINT 16 TO 25 

bpp = bittotal% / n% A 2 

mse = 100 * msen / n% A 2 

IF mse > msemax THEN mse max = mse 
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nse 


msetot = msetot + mse 
nse = 100 * msen / msed 
IF nse > nsemax THEN nsemax - 
nsetot = nsetot + nse 
PRINT "bpp - : COLOR 13: PRINT USING "#.###"; bpp: COLOR 15 

PRINT 

PRINT "MSE - : COLOR 14: PRINT USING "#.### mse: COLOR 15 

PRINT " max : PRINT USING ”#.### msemax 

PRINT " avg : PRINT USING "#.*## msetot / count 

PRINT 

PRINT "NSE - : PRINT USING "###.## nse 

PRINT " max : PRINT USING "###.## _%"; nsemax 

PRINT " avg "; : PRINT USING "###.## nsetot / count 

END SUB 

SUB transform (n%, its(), c(), LUT(), Xform%, dir$) 

DIM temp(n% - 1, n% - 1) , rowi(n% - 1), rowo(n% - 1) 

' This subroutine computes the two-dimensional (forward or inverse) 

• transform of the n% x n% array its. The result is 
' returned in the array c. 

' Perform one-dimensional transform on each row. 

FOR i * 0 TO n% - 1 
FOR j *■ 0 TO n% - 1 

rowi(j) = its(j, i) ' Includes matrix transpose. 

NEXT j 
GOSUB pick 
FOR u - 0 TO n% - 1 
temp(i, u) = rowo(u) 

NEXT u 
NEXT i 

' Transform each column. 

FOR u * 0 TO ni - 1 
FOR i = 0 TO n% - 1 

rowi(i) = temp(i, u) 'Includes matrix transpose. 

NEXT i 
GOSUB pick 
FOR v - 0 TO n% - 1 
c (u, v) = rowo(v) 

NEXT v 
NEXT U 
GOTO done 

' ***** subroutine ***** 

pick: 

SELECT CASE Xform% 

CASE 1 

IF dir$ = "for" THEN 

CALL DCTlD(n%, rowi(), rowoO, LUT()) 

ELSEIF dir$ - "inv" THEN 

CALL IDCT1D (n%, rowi(), rowo(), LUT()) 

END IF 
CASE 2 

CALL WHTlD(n%, rowi () , rowoO) 

CASE 3 

CALL SLANTlD(n%, rowi(), rowoO, LUT()) 

END SELECT 
RETURN 

done : END SUB 
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